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Introduction 


This  report  Is  a follow-on  to  RADC-TR-75-73  "Timing  Figures  for 
Inverting  Large  Matrices  Using  the  STARAN  Asf^'iclatlve  Processor"  [1]. 

In  that  report  an  algorithm  and  timing  figures  were  developed  for  the 
inversion  of  the  A matrix  In  the  system  of  linear  equations 

AX  = B. 

In  this  report  the  results  of  another  phase  of  the  overall  effort  are 
presented:  that  of  actually  Inverting  matrices  utilizing  Gauss  elimination 

on  STARAN.  For  introductory  material  on  the  project,  associative  processor  i 

applications  and  matrix  Inversion  refer  to  [1]  which  Is  available  from  i 

the  Defense  Documentation  Center  in  Alexandria,  Virginia  (AD  A009643). 

In  this  report  some  background  on  STARAN  is  provided  along  with  the 
utilization  of  the  Rome  Air  Development  Center  Associative  Processor  , 

(RADCAP)  facility  of  which  STARAN  is  a part.  The  matrix  inversion 

.1 

process  in  general,  and  a matrix  inversion  application  program  in  particular, 

are  also  discussed  ; 

.1 

RADC  STARAN  I 

The  intent  of  this  section  is  to  acquaint  the  reader  with  RADCAP.  | 

To  accomplish  this  it  is  divir"  d into  three  subsections.  In  the  first  | 

subsection  an  overview  of  t:':  ''■TARAN  architecture  is  presented;  next, 
the  functional  units  utilx^.=d  by  the  matrix  inversion  program  are  dis- 
cussed in  detail;  and  the  final  subsection  gives  the  procedure  for 
submitting  a job  to  STARAN. 

1.  An  Overview 

Shown  in  Figure  1 is  a general  block  diagram  of  the  STARAN  architec- 
ture. As  indicated  in  the  figure  the  basic  components  are  the  associatJ-- 
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array,  response  store,  common  register,  AP  control  unit,  AF  control 
memory,  sequential  control,  sequential  control  memory,  external  function 
logic,  PIO  control,  and  PIO  fllp<  In  this  report  the  term  Associative 
Processor  will  be  used  to  Include  all  of  these  units. 

The  associative  array  Is  a hardware  unit  In  which  data  are  stored. 

With  STARAM  the  basic  array  size  Is  256  x 256.  That  Is,  there  are  256 
words,  and  each  word  Is  256  bits  In  length.  The  RADC  STARAN  configuration 
has  four  (4)  arrays,  thus  yielding  a total  array  size  of  1024  words  with 
each  word  being  256  bits  In  length  or  1024  x 256  • 262,144  total  bits  of 
array  memory. 

Associated  with  each  word  of  the  array  is  a three  bit  response  store. 
It  provides  arithmetic  capabilities,  read/write  capabilities  and  an  Indi- 
cation of  the  results  of  logical  operations.  It  Is  also  used  for  masking 
words  that  are  not  desired  In  a particular  parallel  operation. 

The  common  register  Is  32  bits  long  and  Is  used  In  various  arithmetic 
and  search  operations  on  the  data  In  the  array.  It  Is  also  used  for  load- 
ing the  array  In  a "parallel  by  bit,"  "serial  by  word"  basis. 

Among  other  things,  the  AF  control  unit  directs  the  execution  of 
the  Instructions  that  are  stored  In  the  AP  control  memory,  which  contains 
all  or  part  of  the  user  program  that  Is  being  executed. 

The  PIO  control  unit  controls  the  PIO  flip  network  which  can  shift 
and  rearrange  data  so  that  parallel  arithmetic,  search  and  logic  operations 
can  be  performed  In  a variety  of  ways  among  the  words  of  the  array  modules 
assigned  to  It.  While  AP  control  Is  processing  data  In  some  array  modules, 
PIO  control  can  Input  and  output  data  In  other  array  modules. 

System  diagnostics  and  peripheral  devices  are  handled  by  sequential 


control,  a Digital  Equipment  Corporation  (DEC)  PDP-11  minicomputer. 
Associated  with  sequential  control  Is  the  sequential  control  memory. 


Synchronization  of  the  three  control  units,  AP  control,  PIO  control 
and  sequential  control.  Is  coordinated  by  the  external  function  (EXF) 
logic. 

The  AP  has  a large  number  of  search  and  arithmetic  instructions.  In 
terms  of  search  there  are  several  Instructions  that  can  be  executed  that 
process  all  activated  words  In  the  array.  Some  of  these  are  equal 
(exact  match),  next  higher,  next  lower,  maximum,  minimum,  less  than  or 
equal  to,  less  than  and  greater  than.  As  an  example  consider  an  exact 
match  search.  The  word  that  Is  to  be  used  as  the  search  criteria  is 
placed  In  the  common  register.  (Actually,  the  common  register  may  have 
to  be  loaded  up  to  eight  times  since  It  Is  only  32  bits  long.)  The  word 
is  then  compared  on  a bit  slice  basis  with  each  activated  word  In  the 
array.  Those  that  match  on  all  256  bit  slices  (or  any  prescribed  subset 
of  the  256)  will  be  flagged  in  the  response  store.  These  words  will 
have  the  same  content  as  the  one  In  the  common  register  (for  the  prescribed 
subset  of  the  256) . The  responders  can  then  be  processed  as  prescribed 
by  the  program.  The  other  search  Instructions  are  processed  in  a similar 
fashion. 

The  arithmetic  Instructions  include  add,  subtract,  multiply  or 
divide  one  field  by  another  field  and  place  the  result  In  a third  field. 
This  Is  done  on  an  Intraword  basis  with  all  activated  words  taking  place 
In  the  operation  simultaneously.  Also,  these  operations  can  be  performed  - 
between  the  common  register  and  any  prescribed  field  In  the  array.  As 
an  example,  suppose  there  are  two  columns  of  numbers  x^,...,x^q24 
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j ^1024  would  like  to  perform  the  following  operation: 

i *1  ^1  ” ^1  ^ ” 1,...,1024.  With  a single  Instruction,  all  pairs 

^ of  X and  y values  could  be  added  simultaneously  and  the  resulting  z 

values  placed  In  a third  field. 

In  addition,  one  can  add,  subtract,  multiply  or  divide  any  field  by 

L 

a scalar.  For  example,  suppose  one  would  like  to  obtain  2xj^  for 
1 = 1,...,1024.  In  this  case  one  would  multiply  the  x^  by  the  number  In 
t the  common  register  (two),  and  obtain  the  result  In  the  same  or  an 

additional  field. 

[ 2.  STARAN  Details 

I Shown  In  Figure  2 Is  a block  diagram  of  STARAN  as  utilized  by  the 

I matrix  Inversion  program.  Each  of  the  blocks  Is  discussed  In  detail 

In  the  following  paragraphs. 

The  matrix  Inversion  program  and  the  matrix  data  are  stored  as 
segments  In  MULTICS.  At  execution  time,  they  are  moved  from  MULTICS 
to  the  PDP-11  disk.  The  program  and  data  for  STARAN  execution  are 
loaded  Into  AP  control  memory  from  the  PDP-11  disk.  After  execution, 
output  Is  moved  from  STARAN  via  the  PDP-11  disk  to  MULTICS  and  stored  as 
a MULTICS  segment.  Details  of  these  steps  are  Included  In  the  third 
subdivision  of  this  section. 

In  addition  to  providing  the  communication  link  between  MULTICS  and 
STARAN,  the  FDP-11  has  other  functions.  Assembly  and  debugging  of  STARAN 
programs  are  handled  by  the  PDP-11  as  well  as  housekeeping  functions, 

STARAN  maintenance  and  STARAN  diagnostics.  Also,  the  PDP-11  provides 
the  means  to  Initially  load  AP  control  memory  with  the  previously  assembled 
object  module  and  any  required  data. 
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As  shovm  in  Figure  3,  AP  control  memory  is  partitioned  into  several 
sections:  the  Page  Memories,  the  High  Speed  Data  Buffer  (HSDB)  and  Bulk 


Core  Memory.  The  three  Page  Memories  each  have  512  32-bit  words  and 
use  solid-state  elements  which  have  cycle  times  of  less  than  200  nano- 
seconds. Page  0 is  used  to  store  a subroutine  library.  Pages  1 and  2 
are  used  in  a ping-pong  fashion;  that  is,  AP  control  reads  Instructions 
from  one  page  while  the  other  page  is  being  loaded  by  the  program  pager. 
Each  memory  has  a port  switch  to  prevent  premature  access  before  it  is 
loaded. 

The  HSDB  is  a 512  32-blt  memory  which  also  uses  solid  state  elements 
whose  cycle  times  are  less  than  400  nanoseconds.  Its  purpose  is  to 
provide  a convenient  place  to  store  data  and  instructions  for  quick 
access. 

The  Bulk  Core  Memory  has  16,384  32-blt  words  of  storage;  it  is 
composed  of  nonvolatile  core  with  a cycle  time  of  less  than  one  micro- 
second. Bulk  Core  Memory  is  used  to  store  the  assembled  program  and 
any  data  required  by  the  program. 

The  primary  function  of  AP  control  is  to  control  the  associative 
arrays  as  directed  by  instructions  stored  in  AP  control  memory.  Shown 
in  Figure  4 is  a detailed  block  diagram  of  AP  control  as  it  appears  in 
[4]  on  page  1-10.  AP  control  fetches  an  instruction  from  AP  control 
memory  and  places  It  in  a 32-blt  instruction  register;  the  address  of 


1)  Instruction  Register;  The  Instruction  register  contains  the 
Instruction  being  executed,  loaded  from  AP  control  memory  via 
the  Instruction  bus. 

2)  Program  Control;  The  program  control  logic,  composed  of  the 
program  counter  (PC),  the  start  loop  marker,  the  end  loop  marker, 
the  comparator  and  the  status  register  (IMA.SK)  , controls  the 
sequence  In  which  Instructions  are  obtained  from  AP  conttol 
memory. 

3)  Bus  Logic;  A common  data  path  for  all  pertinent  registers  of 
AP  control  and  the  data  bus  from  memory  Is  provided  by  the  bus 
logic.  The  bus  Is  32  bits  wide  and  the  bus  shift  logic  can  be 
used  to  shift  data  as  It  Is  moved. 

i 

4)  Block  Transfer  Control;  The  block  transfer  control  Is  composed  ^ 

of  two  registers.  The  Data  Pointer  Register  (DP)  contains  the  j 

j 

control  address  for  the  data  bus  for  block  transfers.  The  j 

Block  l«ngth  Counter  (BL)  controls  the  length  of  a data  block 
transfer.  Both  of  these  registers  are  16  bits  In  length.  In 
addition  to  the  above  function,  they  can  be  combined  together 
or  used  Individually  for  Incrementing  or  decrementing  counters 
stored  In  AP  control  memory. 

5)  ComBon  Register;  Data  to  be  loaded  Into  the  arrays  from  AP 
control  memory  and  output  data  from  the  arrays  to  AP  control 
memory  pass-thru  the  common  register.  In  addition,  arguments 
for  search  operations  are  placed  In  the  common  register.  The 
common  register  Is  also  used  to  broadcast  Input  to  the  response 
registers  of  all  four  arrays  simultaneously. 
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6)  Field  Pointers  and  Length  Counters:  Field  pointers  one  and 
two  (FPl,  FP2)  are  used  for  Indirect  addressing  of  the  arrays; 

FPl  Is  loaded  with  the  array  number  and  FP2  Is  loaded  with  the 
number  corresponding  to  the  word  or  bit  slice  desired.  The  two 
field  length  counters  (FLl,  FL2)  are  used  as  counters  for 
branching  and  loop  Instructions.  Field  pointers  three  and 

E (FP3,  FPE)  are  used  for  array  bit  or  word  addresses;  FPE  Is 
also  used  during  multiply  and  divide  routines  and  to  store  shift 
constants.  All  of  the  above  registers  are  8 bits  long. 

i 

7)  Response  Store  Control;  The  control  signals  required  by  the  j 

associative  arrays  and  buffers  for  correct  timing  are  generated  ; 

! 

by  the  response  store  control  The  response  store  control  I 

i 

consists  of  the  control  line  conditioner  and  control  line  i 

buffer.  I 

8)  Array  Control;  The  array  control  logic  selects  the  arrays  to  | 

be  used  and  controls  such  things  as  blt/word  mode,  mask  | 

operations  and  shifting.  The  array  select  register  (AS)  Is  a | 

i 

32-blt  register  used  to  enable  the  desired  array(s).  A one  | 

I 

In  the  bit  position  corresponding  to  the  array  number  enables  | 

the  array.  Bit  slice  or  word  slice  addressing  Is  controlled  by  | 

the  array  address  mode.  The  shift  logic  required  for  shifting  || 

and  mirroring  operations  Is  generated  by  control  signals  from  i 

the  fllp/shlft  control. 

9)  Resolver ; Resolver  logic  Is  used  to  find  the  array  address 
and  word  address  of  the  first  responder  of  some  search  o' 
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As  indicated  earlier,  the  four  associative  arrays  are  each  organized 


as  a square  256  words  by  256  bits  of  solid-state  storage.  Access  to  the 
arrays  is  possible  in  either  the  word  or  the  bit  direction.  The  arrays 
may  be  operated  on  individually,  all  at  once  or  in  various  combinations 
as  enabled  by  the  array  select  register  discussed  above.  Within  each 
array,  it  is  possible  to  operate  on  all  or  just  part  of  the  array  by 
using  a masking  operation.  Masking  in  the  word  direction  is  done  by 
loading  the  M response  store  register  with  ones  in  the  position  of  the 
words  to  be  used;  masking  in  the  bit  direction  is  accomplished  by 
directing  the  common  register  to  operate  on  a specified  field.  The 
X and  Y response  store  registers  can  be  used  to  logically  combine  data 
for  storage  into  the  arrays  or  the  M register.  In  the  application 
program  discussed  subsequently,  we  make  frequent  use  of  the  X and  Y 
registers  for  data  movement  and  for  loading  the  M register. 

3.  Utilizing  RADC  STARAN 

There  are  four  required  steps  for  utilizing  STARAN  at  RADC.  First, 
the  source  code  must  be  written  and  stored  as  a MULTICS  segment;  second, 
the  source  program  must  be  assembled  resulting  in  an  object  module; 
third,  the  object  module  must  be  linked  with  any  required  subroutines  to 
produce  a STARAN  load  module;  and  fourth,  the  load  module  must  be  executed. 

The  first  step  is  to  write  the  source  program.  Programs  to  be 
executed  on  STARAN  are  written  in  the  Associative  Processor  Programming 
Language  (APPLE) . The  MULTICS  editors  "edm"  and  "QEDX"  can  be  used  to 
create  the  program,  edit  it,  and  store  it  into  a segment. 

The  second  step  in  submitting  a job  to  STARAN  is  to  create  a MULTICS 
segment  which  contains  PDP-11  batch  Job  Control  Language  (JCL)  statements 
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[6].  This  segment  will  reference  an  APPLE  source  program,  another 
MULTICS  segment,  calling  it  to  be  moved  from  MULTICS  to  PI)P-11  disk  0. 

If  the  program  contains  or  references  any  mnemonics,  then  MAPPLE  is  called 
to  translate  the  source  program  into  an  APPLE  object  module.  MAPPLE  is 
a collective  term  used  to  indicate  that  both  the  MAPPLE  preprocessor  for 
translating  mnemonics  as  well  as  the  APPLE  assembler  are  executed,  [2,3]. 

The  newly  created  object  module  can  be  stored  on  disk  1 of  the  PDP-11 
or  as  a MULTICS  segment.  A sample  MULTICS  segment  to  accomplish  the 
above  is  included  in  Appendix  Al. 

The  third  step  is  to  create  a multlcs  segment  to  run  the  APPLE  linker, 

ALINK  [5].  Its  purpose  is  to  accept  multiple  object  modules  as  input; 
relocate  those  object  modules  and  assign  them  absolute  addresses;  resolve 
symbolic  references  among  them;  create  an  overlay  structure  upon  request; 
generate  a load  map  to  Indicate  the  absolute  addresses  of  the  load  module 
and  the  entry  point  of  each  load  module;  and  to  produce  an  executable 
code,  the  STARAN  load  module,  in  a format  suitable  for  loading  and 
execution.  The  STARAN  load  module  is  then  stored  on  disk  1 until  required. 

An  ALINK  program  and  load  map  are  included  in  Appendix  A2. 

The  last  step  is  the  execution  of  the  STARAN  load  module.  Again, 
a MULTICS  segment  is  created  with  the  appropriate  JCL  statements.  In 
this  case  the  machine  is  instructed  to  execute  STARAN  Debug  Module  (SDM) 
which  is  a system  program  to  detect  and  locate  errors  in  an  application 
program  [5].  The  functions  provided  by  SDM  include  the  ability  to  dump 
the  contents  of  memory  locations,  to  Inspect  and  change  a memory  location 
or  register;  and  to  print  a table  of  preselected  memory  locations  and/or 
registers  in  AP  Control  Memory,  Parallel  I/O  Control  Memory  and  Array 
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Memory.  The  functions  desired  are  Included  In  the  MULTICS  segment  to  be 
submitted  to  STARAN.  An  example  program  Is  Included  In  Appendix  A3. 

Matrix  Inversion 

The  algorithm  utilized  for  matrix  Inversion  In  this  work  Is  basically 
Gaussian  elimination  adapted  for  use  on  an  associative  processor.  Column 
operations  are  performed  rather  than  row  operations,  and  only  one  row  of 
the  Identity  matrix  Is  appended  at  any  given  time.  The  reader  is  referred 
to  [1]  for  an  example  that  Illustrates  this  method. 

Because  large  matrices  are  to  be  Inverted,  It  Is  deslreable  In 
testing  to  use  a matrix  which  can  be  generated  automatically  and  whose 
Inverse  Is  known.  Matrices  generated  with  the  pattern  as  Illustrated  In 
Figure  5 have  Inverses  with  the  pattern  as  Illustrated  In  Figure  6. 

Thus,  matrices  of  this  form  were  chosen  for  testing. 

The  system-supplied  macros  for  multiplication,  division  and  subtraction 
require  three  empty  fields  for  Intermediate  operations.  Therefore,  the 
largest  matrix  which  will  fit  Into  the  four  arrays  for  Inversion  Is 
63  X 63.  A 60  X 60  matrix  was  chosen  since  it  is  large  enough  to 

nearly  fill  all  of  the  arrays.  As  shown  In  Figure  7,  the  data  are 
loaded  Into  the  arrays.  The  numbers  In  the  arrays  Indicate  the  columns 
of  the  original  matrix  and  the  numbers  to  the  left  of  the  arrays  Indi- 
cate the  word  number.  For  example,  column  1 of  the  original  matrix  is 
replicated  four  times  In  field  A of  each  array  starting  at  words  0,64, 

128  and  192.  The  letters  at  the  top  of  the  figure  are  the  names  assigned 


to  the  32-blt  fields. 
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-110  0 
1-210 
0 1-21 
0 0 1 r.3/4 

INVERSE  CP  4 X 4 MATRIX 

-1  10000000000000 
1 -2  1000000000000 
01  -2  100000000000 
001  -2  10000000000 
0001  -2  1000000000 
00001  -2  100000000 
000001  -2  10000000 
0000001  -2  1000000 
00000001  -2  100000 
000000001  -2  10000 
0000000001  -2  1000 
00000000001  -2  100 
000000000001  -2  10 
0000000000001  -2  1 
00000000000001  -14/15 

INVERSE  CP  15  X 15  MATRIX 


FIGURE  6.  Inverses  of  the  Patterned  Matrioes 


Although  there  are  more  efficient  ways  to  load  the  data  for  smaller 


matrices,  by  following  this  loading  pattern  the  matrix  Inversion  program 
presented  In  this  work  can  be  easily  adapted  to  handle  any  matrix  up  to 
and  Including  a 63  x 63.  For  example,  this  program  was  used  to  Invert  a 
45  X 45  matrix  and  a 30  x 30  matrix.  In  the  case  of  the  30  x 30  matrix, 

[ the  data  could  have  been  fully  loaded  Into  array  0 as  shown  In  Figure  8. 

i 

I However,  this  would  have  required  writing  a new  program.  By  following 

I the  loading  pattern  as  shown  In  Figure  7,  the  matrix  was  loaded  Into 

I arrays  0 and  1 and  Inverted  with  only  slight  modification  of  the  original 

i code.  The  disadvantage  In  following  the  loading  pattern  of  Figure  7 Is 

that  the  Inversion  will  take  longer  because  this  program  Includes  setting 
I the  registers  first  to  enable  array  0 and  then  to  enable  array  1 whereas 

the  more  efficient  configuration  In  which  the  data  are  completely  contained 
In  array  0 would  use  only  array  0 and  therefore  eliminate  several  steps. 
However,  In  the  case  of  the  45  x 45  matrix,  three  arrays  would  still  be 
required  to  contain  all  the  data. 

As  mentioned  previously,  the  largest  matrix  which  can  be  fully 
contained  In  the  four  arrays  and  Inverted  using  the  Gaussian  elimination 
algorithm  referred  to  previously  is  a 63  x 63  matrix.  Matrices  which 
are  larger  than  a 63  x 63  require  that  the  data  be  operated  on  In  stages. 
For  example,  an  80  x 80  matrix  was  Inverted  as  part  of  this  study.  The 
algorithm  to  Invert  the  80  x 80  matrix  first  loads  forty-nine  columns 
Into  the  arrays  and  performs  the  column  operations;  then  after  reading 

i 

these  results  back  Into  AP  Control  Memory,  the  remaining  thirty-one  col- 
umns are  loaded  Into  the  arrays  and  column  operations  are  again  performed. 
These  results  are  read  Into  AF  Control  Memory  and  the  first  group  of 
columns  Is  again  loaded  Into  the  arrays  and  the  algorithm  proceeds  as 
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above.  The  80  x 80  matrix  was  chosen  since  It  was  necessary  to  consider 
a matrix  large  enough  to  require  more  than  the  STARAN  array  capacity, 
yet  small  enough  to  be  contained  In  AP  Control  Memory. 

Matrix  Inversion  Application  Program  - General 

In  this  discussion  the  following  notation  will  be  used.  Elements 
of  the  original  matrix  will  be  designated  by  a^j  where  1 Is  the  row 
number  and  J the  column  number.  Recall,  the  numbers  In  the  boxes  of  the 
array  diagrams  represent  the  column  number  of  the  original  matrix.  Masks 
will  be  denoted  as  MI(J)  where  I Is  the  array  number  (0,1, 2, 3)  and  J the 
position  of  the  mask  In  the  array  (1,2, 3, 4).  For  example,  M0(3)  means 
that  array  0 has  a mask  In  the  third  quarter,  that  Is,  words  128  to  191. 

The  data  are  loaded  into  the  four  arrays  as  previously  Illustrated 
in  Figure  7.  Each  box  is  sixty-four  words  in  length  thus  dividing  the 
arrays  into  four  equal  quarters  in  the  word  direction.  In  the  case  of 
the  60  X 60  matrix,  each  box  will  have  60  words  of  matrix  data,  one  word 
of  Identity  and  three  empty  words.  The  identity  row  Is  appended  to  the 
top  of  the  original  matrix  and  will  move  down  through  the  matrix  row  by 
row  as  the  Inversion  proceeds. 

Each  array  is  divided  into  eight  thirty-two  bit  fields  in  the  bit 
. direction.  Fields  A,B,C,E,  and  F are  used  to  hold  matrix  data;  fields 

i 

G and  H are  used  for  Intermediate  results  of  the  arithmetic  operations; 
and  the  last  field  Is  used  to  store  the  masks  and  to  provide  a bit  slice 
required  by  the  arithmetic  operations. 

The  prop-am  begins  by  dividing  field  A with  the  first  diagonal  element 
of  the  matrix,  a^^^,  to  create  the  Identity  element  (one)  In  this  diagonal 
position.  Note  that  the  Identity  row  is  In  word  0.  Masks  M0(1),  Ml(l), 
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M2(l)  and  M3(l)  are  loaded.  Field  A is  multiplied  by  elements  a^^  a^^ 

a,  and  a,  by  successively  loading  the  above  matrix  elements  into  the 
1,34  1,50 

common  register  and  using  the  system-supplied  macro  to  multiply  field  A by 

the  common  register.  The  results  of  the  multiplication  are  placed  in  field 

H.  Next,  Masks  M0(2),  M1(2),M2(2)  and  M3(2)  are  loaded,  and  field  A is 

successively  multiplied  by  elements  a^^  g,  a^^  22*  35  54 

same  way  as  above.  The  third  step  is  to  load  mask  M0(3),  Ml(3),  M2(3)  and 

M3(3)  and  multiply  field  A by  elements  a,  a,  a,  and  a,  .q  as 

above.  Lastly  masks  M0(4),  Ml(4),  M2(4)  and  M3(4)  are  loaded  and  multiplied 

by  elements  a,  , . , a,  ..  and  a,  Note  that  element  a,  does  not  exist 

1,14  1,30  1,46  1,62 

since  the  matrix  dimension  is  60  x 60.  At  this  point  field  H is  full;  the 
mask  is  set  for  all  words;  and  the  system  macro  for  subtraction  is  used  to 
subtract  field  H from  field  B placing  the  difference  in  field  G.  Field  G 
is  then  moved  to  field  B.  This  results  in  creating  the  identity  element 
(zero)  in  each  of  the  positions  given  above.  Moving  next  to  field  D,  the 
operations  are  repeated  with  Che  first  element  of  each  column  in  field  D 
until  field  H is  full  of  products.  Then,  field  H is  subtracted  from  field  D, 
again  placing  the  difference  in  field  G and  later  moving  field  G Co  field  D. 
In  the  same  way  fields  E and  F are  operated  on.  The  result  of  Che  above 
procedure  is  that  the  identity  row  which  was  originally  appended  to  the  top 
of  the  matrix  has  now  been  created  in  the  first  row  of  the  matrix;  what  was 
previously  the  identity  row  is  the  first  of  the  intermediate  results  leading 
Co  the  Inverse  of  the  original  matrix. 

The  second  phase  of  the  algorithm  begins  by  exchanging  the  position 
of  columns  one  and  two;  the  result  is  shown  in  Figure  9.  The  identity 
row  is  in  the  word  1;  and  the  same  procedure  as  above  is  used;  that  is, 
first  dividing  field  A by  a2  2 then  successively  multiplying  and 
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subtracting  using  fields  B,D,E  and  F.  The  result  is  the  creation  of  the 
identity  row  in  the  word  2.  Columns  two  and  three  are  exchanged,  and  the 
arithmetic  operations  are  repeated  • Then  columns  three  and  four  are  ex- 
changed and  so  forth  until  each  column  has  been  successively  placed  in 
field  A.  The  final  arrangement  of  the  data  will  be  as  shown  in  Figure 
10;  this  will  be  the  Inverse  of  the  original  matrix.  A flow  diagram  of 
the  general  procedure  is  given  in  Figure  11. 

The  reader  is  referred  to  Appendix  B for  a detailed  discussion  of 
the  inversion  program  for  a 60  x 60  matrix.  The  appendix  Includes  a 
figure  indicating  the  program  variables,  a trace  map,  detailed  flow 
charts  of  the  program  and  all  subroutines,  explanations  of  the  main 
program  and  all  the  subroutines,  and  and  listing  of  the  main  program  and 
the  subroutines. 

Timing  of  Matrix  Inversion 

For  purposes  of  comparison,  the  same  patterned  matrices  inverted  on 
STARAN  were  Inverted  using  the  AFL  Plus  system  monadic  DOMINO  at  Syracuse 
University.  Matrices  with  dimensions  5,10,15,20,25,30,35,40  and  45  were 
Inverted;  the  times,  in  seconds,  for  each  of  these  matrices  are  summarized 
in  Table  1.  The  program  which  calculated  these  times  Inverted  each 
matrix  ten  times  and  returned  the  average  of  the  ten  inversion  times. 

The  matrix  inversion  routines  executed  on  STARAN  were  broken  down 
to  give  the  Inversion  time  alone,  the  time  to  load  AP  Control  Memory 
with  the  data  from  the  PDP-11  disk  0,  the  time  to  load  the  arrays  with 
the  data  and  the  time  to  run  the  complete  routine.  It  should  be  noted 
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Table  I.  Timing  Figures  for  Inverting  Matricaes  using  APL  at  Syracuse  University 


Rank  of  Matrix 

Time  in  Seconds 

5 

0.017 

10 

0.117 

15 

0.300 

20 

0.683 

25 

1.300 

30 

2.200 

35 

3.300 

40 

4.756 

45 

6.767 

2 


[ 

i 

j 

' that  for  the  30  x 30,  45  x 45  and  60  x 60  the  arrays  were  loaded  once; 

* however,  for  the  80  x 80  matrix,  the  arrays  were  loaded  and  unloaded  160 

times  with  Intermediate  data.  Each  matrix  was  Inverted  five  times,  and 
the  timing  data  were  collected  for  each  operation  stated  above.  These 
data  are  given  In  Tables  II-  V.  The  results  are  summarized  In  Table  VI. 

Shown  In  figure  12  Is  a graph  which  gives  a direct  comparison  of 
the  matrix  Inversion  as  executed  on  the  two  systems.  The  upper  curve  Is 
a plot  of  the  APL  Plus  Inversion  times;  this  curve  has  been  extrapolated 
to  an  80  x 80  matrix.  It  was  necessary  to  extrapolate  to  the  larger 
sized  matrices  since  the  standard  work  space  provided  at  Syracuse  University 
could  not  Invert  a matrix  larger  than  47  x 47.  Also  Included  In  Figure 
12  are  points  corresponding  to  the  matrix  Inversion  times  using  STARAN; 
these  points  are  Indicated  by  an  "x"  on  the  graph.  The  STARAN  times 
which  are  plotted  are  the  Inversion  times  only  and  do  not  Include  loading 
AP  Control  Memory  or  loading  the  arrays  with  the  exception  of  the  second 
point  for  the  80  x 80  matrix.  Since  data  movement  In  and  out  of  the 
arrays  Is  necessary  to  complete  the  Inversion  In  the  80  x 80  matrix,  the 
time  to  perform  these  operations  Is  Included  In  the  second  point  to 
provide  this  comparison.  It  should  be  noted  that  when  the  Inversion  Is 
executed  using  APL  Plus,  the  data  are  In  the  work  space  and  formatted  In 
the  correct  way  so  that  the  numbers  corresponding  to  the  APL  Plus  Inversion 
times  do  not  Include  any  data  movement  either. 

Summavlzed  In  Table  VII  Is  a comparison  of  the  Inversion  times  for 
the  two  systems. 

Future  Research 

Goodyear  Aerospace  Corporation  has  recently  Introduced  STARAN  Model  E, 
which  Is  similar  In  organization  to  the  current  STARAN  Model  B except  that 
the  size  of  the  array  is  Increased  to  9216  x 256.  The  larger  array  size 
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31 


Timing  Data  for  a 60  x 60  Matrix 


3 


17.9809878 

18.0167450 

18.0215646  18.0215646  17.9809878  18.010776 

18.0174254 

18.0171516 


*nils  tine  includes  loading  and  unloading  the  arrays  vdth  intermediate  results 


♦Times  as  read  from  Figure  12 


will  have  a tremendous  Impact  on  the  matrix  inversion  work.  With  tha 
current  array  size,  the  largest  matrix  which  can  be  fully  contained  in 
the  array  and  Inverted  is  63  x 63.  Once  the  matrix  exceeds  that  size, 
the  inversion  time  Increases  due  to  the  necessary  data  movement.  This 
Increase  was  evident  in  the  Inversion  of  the  80  x 80  matrix.  Recall 
that  the  arithmetic  operations  to  invert  the  matrix  required  22.689 
seconds;  however  when  the  data  movement  required  to  accomplish  the 
inversion  is  Included,  the  time  increased  to  33.943  seconds,  approximately 
a 50%  increase  in  time.  Assuming  a configuration  of  four  arrays,  the 
larger  array  size  would  completely  contain  a matrix  with  dimension  510 
and  invert  that  matrix  without  requiring  any  Intermediate  data  movement. 

Another  option  being  introduced  by  Goodyear  Aerospace  Corporation 
is  a parallel  head  per  track  disk  (PHD) . The  addition  of  the  PHD  would 
provide  the  capability  to  more  rapidly  load  the  data  into  the  arrays. 

Data  loading  is  another  operation  which  adds  to  the  matrix  inversion 
time.  Again  recalling  the  80  x 80  matrix,  to  load  data  into  AP  Control 
Memory  from  the  PDP-11  disk  0 required  7.819  seconds;  then  from  AP  Control 
Memory,  the  arrays  were  loaded.  Clearly  the  capability  to  directly  load 
the  arrays  from  a secondary  device  would  be  advantageous. 

Since  STARAN  now  has  a larger  array  and  the  capability  to  load  and 
unload  data  in  a more  efficient  way,  the  inversion  of  very  large  matrices 
seems  a feasible  problem  and  should  certainly  continue  to  be  investigated. 
The  current  STARAN,  the  PDP-11  minicomputer,  and  the  MULTICS  system  can 
be  used  to  simulate  the  capabilities  of  the  STARAN  Model  E and  its 
peripherals, thus  providing  an  estimate  of  the  time  to  invert  very  large 
matrices  on  STARAN. 
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Conclusions 


The  timing  data  for  matrices  inverted  on  RADC  STARAN  as  shown  In 
Figure  12  Indicate  that  the  times  to  Invert  matrixes  of  various  sizes 
compare  favorably  with  the  times  to  Invert  the  same  matrices  sequentially 
using  APL  Plus  and  thus  supports  the  timing  results  of  RADC-TR-75-73. 

Since  the  expected  timing  data  of  this  report  indicate  the  comparison 
will  prove  more  favorable  as  rank  Increases,  a fact  supported  by  the 
data  in  this  report.  It  can  be  seen  that  the  Inversion  comparisons  will 
become  more  dramatic  as  the  matrix  size  increases. 

The  new  technological  advances  In  STARAN  architecture  discussed 
previously  will  provide  the  system  with  capabilities  which  will  certainly 
have  a positive  Impact  on  the  matrix  inversion  problem. 

Therefore,  it  can  be  concluded  that  the  results  thus  far  are  promising 
and  that  further  Investigation  into  the  solution  of  these  types  of  prob- 
lems using  STARAN  Is  reasonable. 
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APPENDIX  A 


The  Programs  Required  to 
Assemble,  Link  and  Execute  the 
Matrix  Inversion  Program 


A-1 


Al.  The  Aaaembler  Program  mat.l 


The  threefold  purpose  of  this  job  Is  to  move  the  source  program 
from  MDLTICS  to  disk  0 of  the  PDP-11,  create  an  APPLE  object  module 
and  an  assembled  listing  of  machine  Instructions  with  the  corresponding 
address  by  executing  MAPPLE,  and  finally  to  store  the  object  module 
on  disk  1 of  the  PDP-11  for  later  reference.  These  three  sections  are 
delineated  by  $FI  statements;  each  section  Is  discussed  in  detail  below. 


<tFI 

•lijoB  piPHA-n.[:64!-33:j 

$RU  p;i:p 

•H'KO  J / EN 
I DICI  :/EN 

iDKl  ;mtx:uapl/de 

■tDKO;<MLJ:MTX:UAPL 

tHu:DiRO<DKo;/i:ii 

:/vr 


Section  one,  the  first  Job,  Is  started  by  the  JCL  statement  giving 
the  Job  name  and  user  Identification  code  (VIC),  JOB  PZPMATI  [64,33]  In 
this  case.  The  next  statement  tells  the  computer  to  run  PIP.  PIP 
enables  the  transfer  of  files  from  disk  to  disk  In  the  PDP-11  or  from 
disk  to  MULTICS  and  vlceversa  or  to  delete  files  from  a disk.  Disk  0 
and  disk  1 are  enabled  by  the  next  two  statements  to  assure  access  to 
them.  Since  It  is  required  to  have  the  most  up-to-date  APPLE  source 
program  stored  on  disk  1,  delete  the  program  currently  stored.  Next, 
move  the  source  program  MTXl.APL  from  MULTICS  to  disk  0.  Note  the 
extension  .APL  which  Is  used  to  Indicate  an  APPLE  source  program.  The 
final  two  statements  update  the  directory  to  the  PDP-11  disk  files  which 
Is  stored  In  MULTICS. 


4iPI 

$J0b  MAPPLJ.  r64>'33J 
$RUN  hAPPLE 

♦ riKOSMTXl  , AOBjMUJMTXI  . als<i.iko;mtxi  .API,./3 
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The  second  section  is  Initialized  by  the  appropriate  JCL  card. 

Next,  Indicate  the  Intention  of  running  MAPPLE.  The  third  statement 
takes  the  source  program,  MTXl.APL  tdilch  Is  on  disk  0,  executes  MAPPLE 
to  create  an  object  module,  MTXl.AOB,  on  disk  0 and  an  assembled  listing 
of  the  machine  Instructions  with  the  corresponding  addresses,  MTXl.ALS, 
In  a MULTICS  segment.  Note  that  the  extension  .AOB  Is  used  to  Indicate 
an  object  module,  and  the  extension  .ALS  Is  used  to  Indicate  a listing. 


$r- 1 

$J0B  PIPOUT 
$RU  PIP 

tDK  L tMTXl  . AOB/EiP 
*DK1 :<DK0:MTX1 .AOB 
#DK0:MTX1 .AOB/DP 
iBKOSMTXl . APL/DE 
iHUJBIROIBKOJ/BI 
IMUJDIRKBKI  ;/DI 
liPI 


The  third  and  final  section  Is  another  PIP  Job  Initialized  by  JCL 
statement.  First,  delete  the  old  object  module  from  disk  1 and  then 
store  the  new  object  module  on  disk  1 from  disk  0.  The  next  two  statements 
delete  the  object  module  and  source  program  from  disk  0.  Disk  0 Is 
cleaned  periodically  and  also  used  for  library  routines  and  therefore 
nor  used  for  storage.  Finally,  finish  the  Job  by  updating  the  disk 
directories  on  MULTICS. 

'he  computer  program  mat.l  Is  Included  In  this  appendix  as  Figure  13. 


p r ma  L . 1 


^Tia  t . :l 


10/1.3/76  1219.3  erJt  Wed 


i-  .JUB  FIPhA  t ;l  r.64v33  J 
$RU  F!:p 
#riKo:/r£N 
li-BKl  5/EN 

#DKl;MrXl.APL/DE 
ii  DK0:  <hU1MT.Xl  . API. 

•ii«ij:i:aRO<DKo;./i'i:i: 

# mu: DIRK. OKI  :/DI 

liFI 

$JOD  MAPPLl  164 r 333 
$ R U N M A P P L E 

#DK0:MTX1 . A0B»MU:MTX1 .ALSCDKOtMTXl .APL/S 

$ F 1 

$JOB  PIPOUT  i::64»333 
SPU  PIP 

HiKl  IMTXl  , AfJB/DE 
IDKl :<DK0:MTX1 , AOB 
tDKOtMTXl ,AOB/D£ 

^^DK0:MTX1  .APL/DE 

8=mu:diro<.dkq:/di 
*mu:diri<dki :/Di 

•HFI 


r 1219  0,090  0.420  20 


Figure  13:  Ihe  Assenbler  Program 


A2.  The  Link  Program  allnk 

The  overall  purpose  of  the  allnk  MULTICS  segment  Is  to  link  together 
all  object  modules  needed  to  run  the  entire  program  and  to  create  from 
these  a load  module  and  load  map.  Note  that  allnk  Is  the  HULTICS  segment 
name.  The  allnk  program  causes  ALINK,  the  STARAN  APPLE  linker  processing 
programi  to  be  executed.  This  segment  consists  of  three  jobs  delineated 
by  each  of  which  Is  discussed  below. 


iFl 

$JOB  PIP  I64y33I 
$RUN  PIP 

*DK0:MTX1 .ALD/DE 


The  first  part  of  the  program  executed  PIP  to  delete  from  disk  0 


the  old  STARAN  load  module,  MIZl.ALD.  Note  that  the  extension  .ALD 
Is  used  to  indicate  a STARAN  load  module. 

*FI 

»JOB  ALINK  C64f333 
$RU  ALINK 

*DKO ; MTXl . ALD  r MU i MTXl . AMPCDKl : MTXl . AOB/B : 9000 
* DKl  : MTX2  V AOB » BK 1 *.  MTX3  . AOB  » DKl SUBA . AOB 
*DK1 : SUBB . AOB » DKl : SUBC , AOB  r DKl I SUBD . AOB 
♦DKOIFLTSUB.AOB 

#DK0  ‘-FDVC  . AOB  > DKO  I FMPC  . AOB  j DKO  : FLTAS  . AOB/E 

The  second  portion  of  the  MDLTICS  segment  Is  the  ALINK  Job.  In  this 
case,  the  object  modules  following  are  linked  together  to  create  the 
STARAN  load  module,  HTKl.ALD  and  the  STARAN  load  map  MTXl. AMP.  The 
object  modules  which  are  linked  together  In  this  program  are  as  follows: 


1)  The  main  program  MTXl. AOB 

2)  The  subroutines 


MTX2.A0B 


MTX3.A0B 


SUBA. AOB 


SUBB. AOB 


SUBC.AOB 


SUBD. AOB 


r 


3)  System  macros 
F^TSUB.AOB 


FDVC.AOB 


FMPC.AOB 


FLTAS.AOB 


floating  point  subtraction 
floating  point  division 
floating  point  multiplication 
required  floating  point  routines 


A sample  load  map.  Figure  15,  Is  Included  In  this  appendix  following 
the  listing  of  the  ALINK  job.  Figure  14. 

‘iiJOB  PIFEND  i::6'U,i3.J 
$RL)  PIP 

fDK;l.  Ji'iTXl  .ALD/HE 
i D K 1 D K 0 ; M T X :l.  , a l c 

n:Mu;D.iRO<DKo:/D:i: 
iMU'.niRKDKl  J /DI 

The  last  portion  of  this  segment  Is  again  a PIP  job.  Its  purpose 
Is  to  delete  the  old  load  module  from  disk  1 and  then  store  the  new  load 
module  on  disk  1.  Finally,  the  directories  to  the  disk  files  are  updated. 
A3.  The  Execution  Program  gdm 

The  sdm  MULTICS  segment  Is  composed  of  three  jobs  delineated  by  $FI 
statements.  Note  that  sdm  is  the  name  of  the  MULTICS  segment  which  will 
cause  SDM,  the  STARAN  Debug  Module  System  program,  to  be  executed.  Both 
the  first  and  the  last  jobs  are  PIP  jobs  for  moving  data.  The  middle 
section  Is  the  SDM  job  which  Includes  sever^  of  the  debugging  features. 
Each  of  these  sections  Is  discussed  below. 


PIP  ibii-as:;! 

%HUti  PIP 
5 ni'.o : liLPi/UN 
JIIKOJBUM/I.ii: 

IDK0:hAf60.£X'i  'DE 

iDKO ; hAf  60 . EM  ; i-'A  i\;.o 


f-r-  alink 


a !1. 


10/13/76  1221 


$FI 

$J0B  p:i:p  i;:64.33::i 
$RUN  PIP 

I DK0;MTX:l  .ALD/DE 
i|;F  I 

$J0B  ALINK  c:64?33::i 
$RU  ALINK 

« K 0 ; M T X 1 , A L.  B y M U : H T X 1 . A M P / B K 1 ; h T X 1 , A 0 B /'  B J 9 0 0 0 
tBKl ; MTX2  . AO'B  y DKl  X riTX3  . AOB  y BKl  ? SUBA . AUB 
#BK1  :SUBB.AOByBKl  SSUBC.AOByBKl  5SIJBB.A0B 
H'lKOtFLTSUB.AOB 

I BKO ; FBBC . AOB  y BKO I FMPC , AOB  y BKO  J FLTAS  y AOB/E 
'itFI 

$J0B  PIPENB  i::6)4y33::i 
$RU  PIP 

rBKl :MTX1 .ALB/BE 
tBKlMBKOJMTXl.ALB 

ii:mu:biro<bko.*/bi 

JMUJBIRKBKl  ;/BI 
iliFI 


r 1220  0,089  0.650  25 


Figure  14;  The  Link  Program 


i-U.iJUP 


iTXl  . AMP 


MTX:l.  .AMP  lO/OS//;;.  llCl 


MTXJ.  .AMP  al:i:nk  vo2-o:i.  08--0i::r--76  :23 


LG  An  MOnULE  name;  Mi^ 

IN 

TRANSFER  ADnRESSt 

PROGRAM  sections; 

LOW 

HIGH 

SIZE 

<RELOC> 

9000 

9 7 ED 

0/EC 

<Ans> 

0000 

0000 

00  00 

OBJECT  MODULE  NAME; 

MAIN 

PROGRAM  SECT tons; 

LOW 

HIGH 

SIZE 

CRELOO 

9000 

927B 

<ABS> 

0000 

0000 

OO'OO 

PROGRAM  entries; 

CONST  901F 

COUNTER 

90 1 A 

VALUE  90:l.C 

CCJ.NST3 

901 S 

C0NST2  9019 

CONSTO  90:l.E 

CONST.!. 

90  ID 

^ ^ ^ J^.  ^ ^ ^ ^ 

OBJECT  MODULE  NAME; 

PROGRAM  sections; 

LOW 

HIGH 

SIZE 

CRELOO 

927C 

93E9 

0.1 6E 

<ABS> 

0000 

0000 

0000 

PROGRAM  entries; 

SUBl  927C 

)X*)|<)((*)X)X)K))()K 

OBJECT  MODULE  NAME; 

PROGRAM  sections; 

LOU 

HIGH 

SIZE 

CRELOO 

93EA 

953B 

0152 

CABS> 

0000 

0000 

0000 

PROGRAM  entries; 

SUB2  93EA 

OBJECT  MODULE  NAME; 

)!<>!< 

PROGRAM  sections; 

LOW 

H i:gh 

SIZE 

CRELOO 

95  3 C 

957E 

00^3 

CABS> 

0000 

0000 

0000 

PROGRAM  entries; 

SUB A 953C 

OBJECT  MODULE  NAME; 

^ ^ ^ ^ ^ ^ 

^ ^ ^ ^ ^ ^ 

PROGRAM  sections; 

LOW 

HIGH 

SIZE 

CRELOO 

957F 

95C:L 

0043 

CABS> 

0000 

0000 

0000 

PROGRAM  entries; 

SUBB  957F 

********** 

Figure  15: 

Hie  Load  Map 
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opJFCT  module:  name: 

EI0(3F<AM  sections: 

LOW 

HIGH 

SIZE 

<RELOC:> 

95C2 

9604 

0043 

<ABS:> 

0000 

0000 

0000 

PROGRAM  entries: 

SUBC  95C2 

OBJECT  MODULE  NAME: 

PROGRAM  sections: 

LOW 

HIGH 

SIZE 

<RELOC:> 

9605 

9647 

0043 

;:abs:> 

0000 

0000 

0000 

PROGRAM  entries: 

SUBD  9605 

OBJECT  MODULE  NAME.' 

FLTSUB 

PROGRAM  sections: 

LOW 

HIGH 

SIZE 

<RELOC:> 

9648 

968F 

0048 

<ABS::> 

0000 

0000 

0000 

PROGRAM  entries: 

NORM$  9654 

SAVE*  9648 

RSTRI> 

964E 

OBJECT  MODULE  NAME: 

FDOC 

PROGRAM  sections: 

LOW 

HIGH 

SIZE 

.RELOC:> 

9690 

96EC 

005D 

.ABS:> 

0000 

0000 

0000 

PROGRAM  entries; 

FDOCE$  96CS 

FDOC$ 

9690 

OBJECT  MODULE  NAME: 

FMPC 

PROGRAM  sections: 

LOU 

HIGH 

SIZE 

<RELOC:> 

96ED 

9738 

004C 

<ABS;> 

0000 

0000 

0000 

PROGRAM  entries: 

FMPC$  96ED 

OBJECT  MODULE  NAME : 

FLTAS 

PROGRAM  SECrujNS' 

LOW 

HIGH 

SIZE 

< RE  LOO 

9739 

97EB 

00B3 

<ABS.:^ 

0000 

0000 

0000 

PROGRAM  entries: 

ADFi$  9753 

ADC1$ 

976E 

I SBC 2$  97CB 

AL1GN$ 

97D4 

ISBC.LF  9 7BB 

FIELD# 

9739 

ADE2$  v/to2 

SBFl# 

97S7 

ADCLt  977B 

SBCl# 

97AZ 

C0MRE5:  974; 

SBC2$  97AF 

SBF2# 

9 / 96 

T’  1103  0.525  1.698 

70  Icjvel  2 

iy  9 

Figure  15;  Continued 


The  first  PIP  job  Is  Inltlalllzed  by  the  appropriate  JCL  statement. 
The  next  two  statements  refer  to  a file  called  SDM  which  is  created 
initially  on  disk  0 during  the  running  of  STARAN  Debug  Module.  These 
statements  unlock  the  previous  file  and  then  delete  it  in  preparation 
for  the  new  file.  Following  this,  the  data  file  for  the  60  x 60  matrix 
is  unlocked  from  disk  0,  deleted  from  disk  0 and  then  copied  from  the 
MULTICS  segment  called  MAT60.  These  steps  assure  that  the  most  up-to- 
date  version  of  the  data  is  on  disk  0 to  be  called  by  the  program  a** 
execution  time. 

■T.r:c 

$ ,j  0 B B f ::  i^:  R A i:  6 4 , 3 ::i  K D k o ; i;  d 
'tRUN  BDM 
*UA 
tx 

tL .Ci  D 1 \ 0 ; ii  f X :i. . A i... D / I'i i.i 
*.fM.0-APR 
-XO 

* ,p+:p  ro 
1-  .Pf3==M0 
i .P'{  4~X1 
t.PfS^^Yl 

t.P  + 7=:^X2 
t.P+:L0-=Y2 

i,Pfl  2=^X3 
*.P+13-r3 
i'.P-fl4^'  M3 
IIMliAP  9000 

•:  T3!:2i 

The  JCL  statement  for  the  second  part  of  the  segment  has  the  usual 
job  name  and  UIC  number  as  well  as  a statement  to  write  a file  called 
SDM  on  disk  0.  The  SDM  file  on  disk  0 will  contain  all  the  debugging 
Information  called  for  during  the  running  of  the  STARAM  debug  module. 

A sample  print-out  of  this  file  is  also  Included  in  this  appendix.  After 
telling  the  system  to  execute  the  STARAN  Debug  Module  (//RUN  SDM) , 


a WAIT  command  (i/WA)  Is  given.  This  command  enables  the  user's  program 
to  erscute  properly  under  conditions  which  might  cause  unpredictable 
results  or  a premature  halt.  The  WAIT  is  followed  by  a super  clear 
command  (#X'.  Next,  the  STARAN  load  module  is  loaded  in  preparation 
for  execution;  the  /NG  switch  indicates  that  it  is  not  desired  to  execute 
at  this  time.  The  thirteen  statements  following  the  load  command  are 
used  to  establish  a print  table  of  AP  registers.  In  this  case  it  is  re- 
quired that  the  contents  of  all  AF  registers  and  all  response  registers, 
namely  X,  Y,  and  M,  be  printed  out.  SAP  is  the  Start  Associative  Processor 
command.  The  first  address  following  the  SAP  command  Is  the  first  address 
to  be  executed;  the  second  address  Is  the  stopping  address  for  the  execu- 
tion. The  print  table  will  print  the  contents  of  the  specified  registers 
at  the  stopping  address.  If  a second  address  Is  not  specified,  the  en- 
tire program  Is  executed.  Following  the  SAP  command,  the  STARAN  debug 
module  Is  asked  to  create  four  MULTICS  files,  each  to  contain  the  contents 
of  one  of  the  associative  arrays  at  the  time  of  the  stop  address  given 
above.  The  .PI  command  Is  used  to  print  the  contents  of  each  entry  In  a 

Print  Table.  STARAN  Debug  Module  Is  then  terminated  with  #TSCM. 

$F  I 

$jcB  f:i:p2  i.:64f33:.i 

^;RUN  PIP  . 

I MU;  aiKOiSDnl/FA 

I 

The  final  PIP  job  writes  the  SDM  file  with  the  debugging  Information 
from  disk  0 to  a MULTICS  segment  for  easy  access  to  the  debugging  Infor- 
mation. Directories  to  the  disk  files  are  then  updated  before  the  job 
Is  concluded. 

The  complete  sdm  program  is  Included  as  Figure  16  and  the  output  of 
the  sdm  program  is  Included  as  Figure  17. 


A-11 


f>r  SDM 


07/09/76  1209.6  edt  Fri 


*J0B  BERRAi:64»333»DK0:SnM 

DATE:-05-JUL-76 

TlME5-23:i4n8 

♦RUN  SDM 

STARAN  DEBUG 

version;  V 2.0 

««*  UA 


***  X 


**#  LD  DK0;MTX7.ALD/NG 

FLTSUB 
LD  OK 


**#  .P+0=APR 


)«*♦  .P+1=X0 


note*  .P+2=Y0 


***  .P+3=:M0 


♦ .P+4==X1 


.P+5=Y1 


***  .P+6=M1 


**l  .P+7=X2 


.P+10=Y2 


4 Y#  .P+11=:M2 


**t  .P+12=X3 


***  .P+13=Y3 


Y+l  .P+14=:M3 


Figure  17;  sdm  Output 


**♦  SAP  9000:90ri7 


AP  BREAKPOINT  SF  AT  90D7 


**1-  ml):oijti/ar:ooo:off 


*#♦  MU;0UT2/ARn00:iFF 


***  MIJ.*0UT3/AR:200J2FF 


***  mu;out4/ar:3ooj3ff 


.PI 

00 

PC 

= H 

8075 

01 

IR 

= H 

3800  2000 

02 

C 

= H 

0000  0000 

03 

FLl 

= H 

00 

04 

FL2 

= H 

92 

05 

FPl 

= H 

9F 

06 

FP2 

= H 

02 

07 

FP3 

:=  H 

IF 

10 

FPE- 

= H 

00 

11 

BL 

= H 

0000 

12 

BP 

= H 

0E4C 

13 

GET 

= H 

8045 

14 

PUT 

= H 

OIBB 

15 

CNT 

= H 

0000 

16 

AS 

= H 

FOOO  0000 

17 

SI.M 

= H 

8057 

20 

ELM 

= H 

805C 

21 

IMSK 

= H 

OOOE 

22 

HOME 

= H 

20 

01 

X0= 

00000000 

00000007 

40000000 

00000007 

40000000 

00000007 

40000000 

00000007 

02 

Y0= 

00000000 

00000000 

00000000 

00000000 

00000000 

OOOOOOOO 

OOOOOOOO 

OOOOOOOO 

03 

M0= 

FFFFFFFF 

FFFFFFFF 

00000000 

00000000 

OOOOOOOO 

OOOOOOOO 

OOOOOOOO 

OOOOOOOO 

04 

Xl  = 

40000000 

00000007 

40000000 

00000007 

40000000 

00000007 

40000000 

00000007 

05 

Yl  = 

00000000 

00000000 

00000000 

00000000 

OOOOOOOO 

OOOOOOOO 

OOOOOOOO 

OOOOOOOO 

06 

Ml  = 

FFFFFFFF 

FFFFFFFF 

FFFFFFFF 

FFFFFFFF 

FFFFFFFF 

FFFFFFFF 

FFFFFFFF 

FFFFFFFF 

07 

X2= 

40000000 

00000007 

40000000 

00000007 

40000000 

00000007 

40000000 

00000007 

10 

Y2= 

00000000 

00000000 

00000000 

00000000 

OOOOOOOO 

OOOOOOOO 

OOOOOOOO 

OOOOOOOO 

11 

M2= 

FFFFFFFF 

FFFFFFFF 

FFFFFFFF 

FFFFFFFF 

FFFFFFFF 

FFFFFFFF 

FFFFFFFF 

FFFFFFFF 

12 

X3= 

40000000 

00000007 

40000000 

00000007 

FFFFFFFF 

FFFFFFFF 

FFFFFFFF 

FFFFFFFF 

13 

Y3= 

00000000 

00000000 

00000000 

00000000 

OOOOOOOO 

OOOOOOOO 

OOOOOOOO 

OOOOOOOO 

14 

M3= 

FFFFFFFF 

FFFFFFFF 

FFFFFFFF 

FFFFFFFF 

FFFFFFFF 

FFFFFFFF 

FFFFFFFF 

FFFFFFFF 

*♦*  TSCM 
$FI 

TIME;-23tl«:30 


r 1210  0.559  1,180  65  level  At  23 


Fltture  17 ; Continued 
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APPENDIX  B 

Details  of  a 60  X 60  Matrix  Inversion  Program 


Bl.  Matrix  Inversion  Application  Program  Overview 


The  listing  of  the  APPLE  code  for  the  matrix  Inversion  program  and 
the  subroutines  required  by  the  program  are  Included  In  this  appendix. 
MTXl.APL  Is  the  main  program.  MTX2.APL  Is  the  subroutine  which  does 
the  division  In  field  A and  the  arithmetic  operations  on  fields  B and  D. 
MTX3.APL  Is  the  subroutine  which  does  the  arithmetic  operations  on  fields 
E and  F.  SUBA.APL  replicates  a column  from  the  top  quarter  of  an  array 
In  field  A;  SUBB.APL  replicates  a column  from  the  second  quarter  of 
an  array  In  field  A;  SUBC.APL  replicates  a column  from  the  third  quarter 
of  an  array  In  field  A;  SUBD.APL  replicates  a column  from  the  fourth 
quarter  of  an  array  In  field  A.  Also,  each  of  the  replicating  subroutines 
appends  the  new  Identity  row  In  the  proper  position. 

A list  of  the  variables  used  by  the  program  Is  shown  In  Figure  18; 
a trace  map  of  the  program  and  subroutines  Is  shown  In  Figure  19; 
finally,  detailed  flow  charts  and  the  listings  for  the  program  and 
subroutines  are  shown  In  Figures  20-33. 

Each  of  the  programs  Is  broken  Into  small  sections  for  ease  of 
discussion.  The  small  sections  are  first  discussed  In  general;  then  the 
listing  of  the  small  section  Is  given;  finally,  the  listing  Is  discussed 
In  detail. 

B2.  The  Main  Program;  MTXl.APL 

The  main  program  Is  Initialized  In  the  first  section  of  the  program 
as  follows.  The  subroutines  are  defined  as  external  modules  giving  the 
main  program  access  to  them;  the  variables  In  the  main  program  are 
defined  to  permit  the  subroutines  to  access  them;  the  32  bit  fields  used 
In  the  program  are  defined;  and  the  arrays  are  cleared  In  preparation 
for  loading  the  new  data. 
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32 

bit 

field 

starting 

at 

bit 
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B: 
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32 

bit 

field 

starting 

at 

bit 

32 
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32 

bit 

field 

starting 

at 

bit 

64 

E: 
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32 

bit 

field 

starting 

at 

bit 

96 

F: 
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32 

bit 

field 

starting 

at 

bit 

128 

G: 
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32 

bit 

field 

starting 

at 

bit 

160 

H: 

a 

32 

bit 

field 

starting 

at 

bit 

192 

COUNT: 

The 

matrix  dimension; 

keeps  t: 

operations.  Count  is  initialized  to  sixty  and  decremented  by 
one  after  each  column  is  used  in  field  A.  When  count  is  zero, 
the  program  is  terminated. 

COUNTER:  The  number  of  word-wise  division  (4)  is  each  matrix;  used  during 
the  loading  operation.  Each  array  is  filled  by  loading  sixty- 
one  words  in  fields  B,D,E  and  F then  skipping  three  words  and 
repeating  for  a total  of  four  times. 

C0NST3:  Initially  0 but  changed  during  the  program;  used  to  point  to 

the  desired  array  during  the  arithmetic  operations  of  the 
subroutines,  SUBl  and  SUB2.  SUBl  and  SUB2  operate  first  on 
array  0,  then  array  1,  array  2 and  array  3.  The  value  in 
C0NST3  is  loaded  into  FPl  to  enable  the  correct  array  for  the 
arithmetic  operations. 

CONSTO:  Initially  1 but  Incremented  during  the  program;  used  to  load 

into  FF2  which  points  to  the  word  that  becomes  the  new  identity 
word  via  the  arithmetic  operations  of  the  subroutines  SUBl  and 
SUB2. 


Figure  18:  i^lication  Program  Variables 


VALUE; 


’1 


Contains  a number  to  be  loaded  Into  the  array  select  register 
(AS) . The  Initial  number  80000000  enables  array  0 when  loaded 
In  AS.  The  number  in  VALUE  will  be  changed  later  in  the  pro- 
gram by  moving  the  number  stored  in  CONST  to  this  location. 

All  the  arrays  are  successively  enabled  by  changing  the  number 
in  the  address  VALUE. 

CONST:  The  storage  word  for  the  next  number  to  be  placed  in  VALUE; 

intlally  40000000  which  will  enable  array  1 when  loaded  in  AS. 
The  number  in  this  location  will  be  changed  by  the  program. 

C0NST4:  The  number  20000000  is  stored  here  to  be  later  loaded  into 

CONST.  This  number  will  enable  array  2 when  loaded  in  AS. 

C0NST5:  The  number  10000000  is  stored  here  to  be  later  loaded  into 

CONST.  This  number  will  enable  array  3 when  loaded  in  AS. 

CONSTl  and 

C0NST2:  These  are  initialized  to  three  and  two  respectively.  During 

the  program  they  are  decremented  and  control  the  number  to  be 
placed  in  location  CONST. 


Figure  18;  CJontinued 
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Initialization  of  the  program  la  accomplished  by  the  START  coiHMnd; 
the  label  associated  with  START  names  the  object  file  produced  by  the 
assembly.  The  EXTRN  mnemonic  permits  the  main  program  to  reference 
labels  In  other  program  modules;  In  this  case,  the  la' /els  are  In  the 
subroutines.  Subroutines  are  given  access  to  labels,  l.e.  variable?, 
defined  In  the  main  program  by  the  ENTRY  mnemonic.  The  ORG  Instruction 
commands  the  assembler  to  assemble  succeeding  Instructions  at  the 
address  specified  In  the  argtiment.  Fields  are  defined  via  the  Define 
Field  (DF)  command.  The  first  number  In  the  argument  following  the  DF 
command  Is  the  first  bit  position  of  the  field,  the  second  number  refers 
to  the  length  of  the  field.  STORE  Is  equated  (EQU)  with  the  address 
BOCO,  the  address  where  the  matrix  data  will  be  stored.  The  number 
BOOO  Is  a hexldecimal  number  as  Indicated  by  the  X preceding  It.  in 
preparation  for  loading  the  arrays,  the  array  select  register  (ASH)  Is 
loaded  Immediately  (LI)  with' FOOD.  In  this  command,  the  H In  ASH  refers 
to  the  high  order  bits  of  the  array  select  register;  the  X preceding 
FOOD  Indicates  to  the  computer  the  number  following  Is  In  hexldecimal; 
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the  F In  FOOD  which  translates  from  hexldecimal  to  1111* In  binary  enables 
all  four  arrays.  The  M register  Is  set  In  all  four  arrays  with  the  SET 
command  In  preparation  for  loading  the  arrays.  Finally,  the  arrays  are 
cleared  (CLRF)  In  jlt  positions  starting  at  0 and  ending  at  256. 

In  this  next  section  two  parts  of  the  program  are  considered,  one  of 
which  occurs  e'-  the  beginning  of  the  program  and  the  other  at  the  end 
of  the  program.  These  two  sections  reference  each  other  and  work  together 
to  load  the  matrix  data  Into  AP  control  memory  from  a MULTICS  segment. 


UF-EN 

BUFFER 

READ 

DATA 

W i-!  .t.  1 1 

IOWA IT 

LINK. BUSY 

CLOSE 

LINK 

BUSY 

I LOCK  LI 

12 

B 

WAITI 

ERFU1R 

WAIT 

BiJF-  F 'ER 

OBUFF 

LINK  y y DK  y HAT60  y EXT  y 4 y ERROR 

DATA 

RBUFF 

L I NK y STORE yl 4640 y 3 

3AUE 

DS 

LINK 

DC » 2 

! 

The  OPEN  command  must  precede  all  I/O  Instructions;  It  prepares  the 
system  for  an  eventual  REAO/WRITE  Instruction.  The  argument  following 
the  OPEN  command  references  the  label  of  an  OBUFF  Instruction,  BUFFER  In 
this  case.  The  OBUFF  Instruction  contains  the  parameter  Information 
required  by  the  OPEN  Instruction.  Note,  the  OBUFF  Instruction  occurs  In 
the  latter  part  of  the  program.  In  the  OBUFF  command  are  up  to  seven 
arguments,  six  of  which  are  applicable  In  this  case.  LINK  references 
the  label  of  an  expression  referencing  a required  llnkword  which  is  two 
words  of  storage  In  STARAN  provided  by  the  programmer  and  Initialized 
to  zero.  The  second  argument  Is  optional;  If  Included,  it  Is  the  logical 
name  of  the  data  set.  If  not,  as  In  this  case,  two  successive  coomwa  are 
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used.  The  third  argument  refers  to  the  name  of  the  physical  device 
associated  with  the  data  set,  in  this  case  the  disk  (DK) . MAT60  is  the 


file  name  of  the  data  set  and  EXT  is  the  file  name  extension;  these  are 
the  fourth  and  fifth  arguments.  The  sixth  argument  is  a code  which 
specifies  how  to  open  the  file.  In  this  case,  4 will  open  a previously 
created  linked  file  for  input  via  a READ.  The  final  argument,  ERROR,  is 
a STARAN  address  where  control  is  transferred  if  an  error  occurs  during 
the  OPEN  process. 

The  READ  command  initiates  the  transfer  of  the  data  into  a specified 
STARAN  buffer.  There  is  one  required  argument,  DATA  in  this  case,  which 
references  the  RBUFF  instruction  which  contains  the  parameter  information 
required  by  a READ  function.  The  RBUFF  Instruction  occurs  in  the  latter 
part  of  the  program.  Four  arguments  are  required  by  RBUFF.  LINK,  the 
first  argument,  references  the  same  linkword  provided  by  the  OPEN  command. 
STORE  is  the  address  of  the  first  location  in  bulk  core  memory  for  the 
STARAN  input  buffer;  note  that  STORE  was  previously  defined.  The  third 
argument,  14640,  gives  the  maximum  number  of  eight  bit  bytes  of  data  to 
be  input.  The  final  argument  selects  the  mode  of  transfer;  in  this  case, 

3 selects  unformatted  binary. 

lOWAIT  is  an  instruction  to  allow  the  user  to  determine  when  the 
input  buffer  is  filled.  The  first  argument  is  required  and  references 
the  same  linkword  referenced  previously;  the  second  argument  is  optional 
and  specifies  an  address  to  which  to  transfer  control  while  the  I/O 
process  is  still  in  progress.  In  this  case,  the  control  is  transferred 
to  BUSY.  BUSY  is  a label  in  the  latter  part  of  the  program.  If  the 
program  goes  to  BUSY,  Interlock  number  twelve  is  set.  The  instruction 
following  BUSY  is  a branch  back  to  lOWAIT. 
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The  close  command  is  always  related  to  a particular  OPEN  command. 
When  the  I/O  is  finished,  the  CLOSE  command  is  executed,  thus  releasing 
the  system  for  the  next  instruction.  The  final  instruction  of  this 
section,  SAVE,  is  one  word  of  storage  used  for  temporary  storage  of  a 
counter  as  the  program  runs. 


In  the  portion  of  the  program  which  follows,  the  initialization  of 
the  program  continues. 


B $f:i.  :l. 


CuNSTS 

nc 

X ■'  1 0C\.-  0000 

CON ST 4 

DC 

X ' 20000000 

C0NST3 

DC 

l*'l 

const:; 

DC 

COUNTER 

DC 

A 

COUNT 

DC 

60 

'v'ALUE 

L'  C 

X ' 80000000 

C 0 N 3 7'  i 

DC 

3 

CO NS TO 

DC 

■I 

CONST 

DC 

X' 40000000 

The  constants  (DC)  referenced  by  the  main  program  and  the  subroutines 
are  defined.  Recall  that  the  X preceding  numbers  Indicates  they  are 
hexldeclmal.  The  initial  statement  is  a branch  (B)  around  the  constant 
definitions;  the  branch  is  necessary  since  these  are  non-executable 
statements.  All  of  the  constants  above  are  fully  explained  in  Figure  18. 

The  objective  of  this  section  is  to  generate  masks  used  by  the 
program  during  subsequent  arithmetic  routines  and  then  to  store  these 


masks  in  the  arrays  for  future  reference. 
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L,L.K 

X 

CLR 

Y 

CLR 

M 

LI  32 

C?X'FFFI-FFF-  F 

GEN » 32 

X' 420088A0 ' 

GEN? 32 

X '42208840' 

GEN  ? 32 

X'40009943' 

L. 

M?  Y 

GEN? 32 

X' lAFGOOOl  ' 

GEN?32 

X'40C08852' 

CLR 

M 

L 

M?  Y 

GEN? 32 

X':lAF60001' 

GEN  ? 32 

X ' 40C08e52 

CLR 

M 

L. 

M ? Y 

GEN? 32 

X':;.AF70001' 

GEN? 32 

X'40C0S852' 

CLR 

M 

L 

N ? Y 

GEN  ? 32 

X'.1.AF3000j.  ' 

^ Before  using  the  response  store  registers,  (X,Y,M)  they  are  cleared 

(CLR).  The  L132  consBand  loads  all  thirty  two  bits  of  the  common 
register  (C)  with  the  hexldeclmal  number  FFFFFFFF.  When  translated  to 
binary,  this  number  Is  thirty-two  contiguous  ones.  Following  the  loading 
of  the  C,  there  are  three  machine  Instructions  which  first  load  the 
contents  of  the  C Into  bits  0 to  31  of  the  Y register;  second,  load  the 
contents  of  the  C Into  bits  32  to  63  of  the  X register;  and  finally 
logically  OR  X and  Y leaving  the  result  In  Y.  The  net  result  Is  sixty- 
four  ones  In  bits  0 to  63  of  the  Y register.  Note,  It  Is  necessary  to 
proceed  In  this  way  since  loading  a response  register  from  the  common 
I register  automatically  clears  the  rest  of  the  register. 

The  contents  of  Y are  loaded  (L)  Into  H,  the  mask  register.  Since 
this  mask  Is  used  many  times  In  the  program.  It  Is  stored  In  a bit 
slice  of  the  array  for  future  reference.  The  next  machine  Instruction 
stores  the  contents  of  M Into  bit  slice  F5(245)  of  all  four  arrays. 
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Recall  the  Y register  has  ones  In  bits  0 to  63,  the  top  quarter  of 
the  register.  The  next  machine  Instruction  shifts  the  contents  of  Y by 
64  bits  with  the  result  that  there  are  now  ones  In  the  bit  positions 
64  to  127.  Next  clear  (CLR)  M 1q  preparation  for  loading  (L)  M with 
the  contents  of  Y.  The  machine  Instruction  following  the  loading  of  M 
stores  the  contents  of  the  M register  to  bit  slice  F6(246)  of  all  four 
arrays. 

It  Is  now  necessary  to  generate  the  mask  for  the  third  quarter. 

This  Is  accomplished  by  a machine  Instruction  to  again  shift  the  contents 
of  Y by  64  bits  so  that  the  ones  now  occur  In  bit  positions  128  to  191. 
Again  clear  (CLR)  M In  preparation  for  loading  (L)  Y Into  M.  The  next 
machine  Instruction  stores  the  contents  of  M Into  bit  slice  F7(247)  of 
all  four  arrays. 

The  final  section  Is  again  a repeat  of  the  above.  The  Y register 
Is  first  shifted  by  64  bits  so  that  the  ones  are  In  positions  192  to 
255.  Then,  after  clearing  (CLR)  M,  load  (L)  the  contents  of  Y Into  M 
and  finally  store  M Into  bit  slice  F8(248)  of  all  four  arrays. 

The  net  result  of  this  section  Is  that  there  are  four  masks  for 
future  reference  which  are  stored  In  the  arrays  for  easy  access  by  the 
program. 

In  this  portion  of  the  program  data  are  loaded  Into  the  arrays, 
specifically  field  A,  which  will  contain  the  first  column  of  matrix 
data.  The  column  Is  loaded  once  and  then  replicated  In  field  A,  four 
times  In  each  array. 
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laltlaLly,  FFE,  one  of  the  field  pointers.  Is  loaded  with  61  since 
61  words  of  data  are  to  be  loaded.  After  each  word  Is  loaded,  FFE  will 
be  decremented;  when  it  reaches  zero,  the  program  will  branch  out  of  the 
loading  operation.  DF,  FFl,  and  FF2  are  Initialized  to  zero.  DF  is  a 
counter  used  to  move  progressively  through  the  data  buffer  in  STARAN 
memory.  The  field  pointers  Initialize  the  array  and  word.  The  first 
word  from  the  buffer  STORE  modified  by  DF,  that  is  with  DF  added  to  the 
address,  is  loaded  into  the  common  register  with  the  load  register  (LR) 
command.  The  value  2 automatically  increments  DP.  The  number  in  the 
common  register  is  then  stored  in  the  array  (SOW)  from  field  A of  the 
common  register  to  field  A of  the  array  in  a position  selected  via  the 
field  pointers,  FPl  and  FP2.  After  each  word  is  loaded,  FP2  is  Incremented 
resulting  in  a move  to  the  next  word  In  field  A.  FPl  is  not  Incremented 
since  load  array  0 will  continue  to  be  loaded.  The  counter  FFE  is 
decremented  (OECR) ; and,  as  long  as  it  la  not  zero,  the  program  will  branch 
(BNZ)  back  to  the  beginning  of  the  load  operation  (LOADA) . 


i 
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The  next  section  of  the  program  takes  the  data  from  field  A and  uses 
the  response  store  registers  to  replicate  It  four  times  In  each  array. 

The  loop  to  achieve  replication  Is  Initialized  by  setting  the  field 
pointers  to  zero  and  clearing  the  response  store  registers,  X and  Y. 

The  loop  proceeds  as  follows. 

1.  Loop  thirty-two  times  to  address  MOVEA.  The  loop  Is  performed 
thirty-two  times  since  each  word  Is  thirty-two  bits  long. 

2.  Load  bit  slice  selected  by  FP2  of  words  0 to  31,  Into  the 
common  register. 

3.  Load  the  common  register  Into  Y,  bits  0 to  31. 

4.  Load  bit  slice  selected  by  FP2  of  words  32  to  63,  Into  the 
common  register. 

5.  Load  the  common  Into  X,  bits  32  to  63. 

6.  Logically  OR  X and  Y leaving  the  result  In  Y.  At  this  point 
one  bit  slice  of  field  A Is  In  Y,  bits  0 to  63. 

7.  Store  Y Into  X. 

8.  Shift  Y by  64  bits;  this  results  In  the  Y register  containing 
the  bit  slice  In  position  64  to  127. 

9.  Logically  OR  X and  Y leaving  the  result  In  Y.  Y now  contains 
the  bit  slice  replicated  twice  In  bits  0 to  31  and  32  to  63. 

10.  Store  Y Into  X. 

11.  Shift  Y by  128  bits.  Y now  contains  the  bit  slice  replicated 
twice  In  bit  positions  128  to  191  and  192  to  255. 

12.  Logically  OR  X and  Y and  store  the  result  In  Y.  This  results 
In  Y containing  the  bit  slice  replicated  four  times. 

13.  Write  Y Into  all  arrays  In  the  bit  slice  pointed  to  by  FP2; 
Increment  FI’2. 
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The  final  statement  Is  a no-operatlon  used  here  to  allow  for  automatic 


speed  up  execution. 

In  this  section  the  loading  operation  Is  continued  by  loading  arrays 
0,1,2,  and  3,  In  that  order,  until  all  the  matrix  data  are  In  the  arrays. 
This  section  Is  specifically  concerned  with  loading  fields  B,D,E  and  F In 
arrays  0,1  and  2.  Array  3 Is  loaded  In  the  following  section  of  code 
since  the  data  configuration  for  array  3 differs  from  arrays  0,1  and  2. 
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Again,  Initialize  (LI)  PFE  with  61,  the  number  of  words  to  be  * 
loaded  in  each  field.  FF2,  the  word  pointer,  is  Initialized  (LI)  to 


zero.  FF3  is  loaded  (LI)  with  3.  FP3  is  used  to  keep  track  of  duplicate 
array  operations.  In  this  case,  arrays  0,1  and  2 are  loaded  in  the 
same  way;  thus,  FP3  is  initialized  by  3.  Successively,  load  (LR)  the 
conanon  register  with  data  from  STORE  (DP)  and  store  (SCW)  it  in  the 
array  in  fields  B,D,E  and  F in  the  word  position  pointed  to  by  FP2. 

Next,  Increment  (INCR)  FF2  and  decrement  (DECR)  FPE.  T.f  FPE  is  not 
zero,  branch  (BNZ)  back  to  the  beginning  of  the  loading  operation  LOAD. 
When  FPE  reaches  zero,  FP2  must  be  incremented  by  three  to  move  to  the 
next  64  word  block.  (Recall  that  61  data  words  are  loaded  in  each 

block).  Since  each  array  is  to  be  loaded  with  four  sections  of  columns, 

COUNTER  is  initially  4 to  keep  track  of  the  number  of  sections  that  have 
loaded  as  follows.  First  the  value  in  DP  must  be  saved  so  that  the  pro- 
gram will  be  able  to  continue  the  loading  operation  at  the  correct 

position  in  the  data  buffer  by  storing  the  value  in  DP  (SR)  in  location 
SAVE.  Then  the  COUNTER  is  loaded  (LR)  into  the  two  registers  BL  and  DP. 
It  is  necessary  to  load  into  BL  and  DP  combined  since  each  is  a 16  bit 
register  and  the  number  in  COUNTER  is  a 32  bit  number.  However,  since 
the  number  of  significant  digits  of  the  number  is  small,  it  will  be 
fully  contained  in  the  lower-order  bits,  that  is,  DP.  Therefore,  to 
decrement  COUNTER  it  is  only  necessary  to  decrement  (DECR)  DP.  The  new 
value  of  COUNTER  is  then  saved  by  storing  (SR)  DP  in  the  memory  location 
COUNTER.  The  value  of  COUNTER  is  tested  next,  which  is  still  in  DP.  If 
the  value  is  zero,  branch  (BZ)  to  the  address  NEXTARRAY  to  load  the  next 
array.  If  the  COUNTER  is  not  zero,  continue  in  the  same  array  by  again 
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initializing  (LI)  FPE  to  61  and  recalling  (LR)  Che  value  from  SAVE  to  DP 
and  branching  (B)  to  LOAD.  However,  once  COUNTER  is  zero.  It  is  necessary 
Co  begin  loading  the  next  array.  This  means  a branch  to  NEXTARRAY.  At 
NEXTARRAY,  COUNTER  Is  again  Initialized  (LI)  Co  four  and  stored  (SR)  at 
location  COUNTER.  FP3,  which  Is  keeping  track  of  Che  number  of  duplicate 
array  operations.  Is  decremented  (DECR).  If  FP3  Is  not  zero,  the  loading 
continues  as  previously  described  by  Initializing  FP2  to  zero,  incre- 
menting PPl  (the  array  pointer),  loading  (LI)  FRE  with  61  and  branching 
to  LOAD.  However,  If  FP3  Is  zero,  the  program  will  branch  (BZ)  to 
ARBAY3,  the  loading  operation  for  array  3. 

In  this  section,  the  last  array  Is  loaded.  Recall  from  Figure  5 
the  configuration  of  the  data  In  array  3 ard  note  Che  difference  from 
arrays  0,1  and  2. 


ARRAY3 


L0ADA3 


SR 

DP.rSAVE 

LI 

( Bl...  y JP  ) y 2 

SR 

( BLy DP ) y COUNTER 

LR 

DP y SAVE 

LI 

FP2  y 0 

NOP 

LI 

BLy3 
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CySTORE(DP) .3 

sew 

AyB 
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CySTORE(DP) y3 
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AyD 
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sew 

AyE 

LR 

Cy STORE (DP) y3 
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INCR 

FP2 
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FPE 

BNZ»FPE 
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SR 

DP y SAVE 

LR 

( BLy  DP ) y COUNTER- 

DECR 

DP 

SR 

( BLy DP ) y COUNTER 

BZ » DP 

LAST 

LR 

DP y SAVE 

RPT»3 

INCR 

FP2 

LI 

FPE y 61 

B 

L0ADA3 

This  section  begins  a pattern  which  Is  repeated  sixteen  times. 
First,  branch  and  link  the  subroutines  which  perform  the  necessary 


arithmetic  operations.  These  subroutines  are  explained  In  full  detail 
In  the  following  sections.  Then,  after  the  appropriate  masking,  the 
data  are  moved  so  that  the  next  column  to  be  used  as  the  pivot  Is  moved 
Into  field  A.  The  counter  which  was  Initialized  to  the  matrix  dimension 
Is  checked  and.  If  It  Is  zero.  Indicating  the  Inversion  Is  complete,  the 


herwlse  the 

program  continues  with  column  operations. 

BALyR7 

SUBl 

BAL » R7 

SUB2 

LR 

ASHyUALUF 

GENj;32 

X'OSFSOOOS' 

MVF 

A y G 

MMF 

ByA 

MVF 

GyB 

LR 

< BLy DP ) y COUNT 

DFCR 

DP 

BZyDP 

OUT 

SR 

< BLy DP ) y COUNT 

LI 

ASHyX'FOOO' 

BALyR7 

SUBA 

I 

I 


The  branch  and  link  command  (BAL)  Is  followed  by  one  of  the  branch 
and  link  registers  (R7).  This  Instruction  transfers  control  of  the 
program  to  the  subroutine  SU61  after  storing  the  Execution  Location 
Counter  of  the  next  Instruction  In  the  branch  and  link  register.  Briefly, 
SUBl  will  divide  field  A to  create  the  Identity  element  and  then  perform 
the  arithmetic  operations  between  column  A and  columns  B and  D.  Next 
branch  and  link  as  above  to  subroutine  SUB2;  SUB2  performs  the  arithmetic 
operations  between  field  A and  fields  E and  F.  At  this  point,  all  required 
arithmetic  operations  with  the  column  currently  In  field  A have  been  com- 
pleted and  the  program  proceeds  to  move  a new  column  Into  the  pivot 
position.  Since  the  new  pivot  column  will  change,  the  appropriate 
array  must  be  enabled.  Therefore,  load  (LR) , the  array  select  register 
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(ASH),  with  the  number  In  the  address  VALUE.  Initially  VALUE  has 

80000000  which  will  enable  array  0.  After  completing  the  sixteen  ! 

repetitions  of  this  portion  of  the  code  which  follow,  all  of  the  columns 
In  array  0 will  have  been  used  as- the  pivot  column.  The  next  step  will 
be  to  move  to  array  1 which  will  be  accomplished  by  changing  the  number 
In  VALUE  to  40000000  and  repeating  the  program.  After  completing  array 

1,  VALUE  Is  changed  to  20000000  thus  enabling  array  2 and  again  repeat  ] 

the  same  code.  Finally,  VALUE  will  be  changed  to  10000000  to  enable  j 

array  3 and  the  code  repeated  until  completion  of  the  Inversion.  j 

The  machine  Instruction  moves  the  mask  stored  In  bit  slice  F5(245)  I 

to  the  M register.  Note  In  the  sections  following  that  the  program  will 
load  from  bit  slice  F5,  F6,  F7  and  F8,  depending  upon  which  section  of 
the  array  It  Is  desired  to  work  with.  Also,  each  of  these  masks  Is  used 

1 

four  times  since  there  are  four  columns  In  each  quarter  of  the  array  to  | 

be  used  as  the  pivot.  With  the  appropriate  mask  In  place,  the  column  In  j 

field  A Is  stored  In  field  G.  Then,  the  next  column  to  be  used  Is  moved  | 

Into  field  A.  (In  this  case,  field  B Is  moved  to  field  A although  In 

subsequent  repetitions  of  this  code  the  field  moved  Into  A will  change) . i 

Finally,  the  column  temporarily  stored  In  field  G Is  moved  (MVF)  Into  j 

the  place  Just  vacated.  i 

At  this  point  It  Is  desired  to  check  to  see  If  all  columns  have  1 

I 

been  used  as  a pivot.  Therefore,  load  (LR)  registers  BL  and  DP  with  the 
counter  COUNT,  which  was  Initialized  to  the  matrix  dimension  of  60. 

Then,  decrement  COUNT;  and.  If  It  Is  zero,  branch  (BZ)  to  the  end  of  the 
program  (OUT).  Otherwise,  store  (SR)  the  new  value  In  the  address  COUNT. 

All  of  the  arrays  are  enabled  by  loading  (LI)  the  array  select  register 

(ASH)  with  FOOO.  The  program  will  then  branch  and  link  SUBA.  SUBA, 

like  SUBB,  SUBC,  and  SUBD,  takes  the  new  data  In  field  A and  replicates 
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it  four  times.  The  arithmetic  operations  will  now  begin.  The  next 
sections  describe  this  and  continue  repeating  the  above  operations  until 
the  matrix  is  Inverted. 


This  last  portion  of  the  program  is  needed  to  change  the  number 
stored  in  VALUE.  Recall  that  VALUE  is  loaded  into  the  array  select 
register  in  the  previous  sections.  Another  storage  word  in  address 
CONST  is  used  to  accomplish  this. 


LR 

(BL»DP) y CONST 

LR 

(BLyDP) y VALuL 

I..R 

DP y CONS  13 

INCR 

DP 

SR 

jipyCONSTS 

LR 

DP  y CONS  I :i. 

DECR 

DF' 

SR 

DP y CONST! 

BZyDF' 

NEXT 

LR 

v'BLyDP)  y(:;0NST4 

SR 

(BLyDP) .CONST 

LR 

DPyC0NST2 

Df-CR 

DP 

SR 

DP y CONST 2 

BNZyDP 

NEXT 

LR 

(BLyDP)  yco:;-  rs 

SR 

( BL  y DP ) y CONST 

B 

NEXT 

NOP 

NOP 

WAIT 

The  first  time  through  the  arithmetic  routines,  VALUE  is  loaded 
with  80000000,  the  initial  number,  enabling  array  0.  CONST,  which 
initially  la  40000000,  is  then  loaded  (LR)  into  the  registers  BL  and  DP 
and  stored  (SR)  under  VALUE.  Next,  load  (LR)  register  DP  with  C0NST3 
and  then  Increment  (INCR)  DP  and  store  (SR)  the  new  number  at  C0NST3. 
C0NST3  is  used  by  the  subroutines  SUBl  and  SUB2  to  be  loaded  in  FPl,  the 
array  pointer.  At  this  point,  the  operations  on  one  array  have  been 
completed.  C0NST3  is  therefore  Incremented  so  that  the  program  will 
move  to  the  next  array. 
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CONSTl,  Initially  3,  Is  loaded  (LR)  Into  the  DP  register  and  then, 
decremented  (DECR) • The  purpose  of  this  counter  Is  to  count  the  number 
of  times  VALUE  must  be  changed*  If  VALUE  has  been  changed  three  times. 

It  will  then  be  zero  and  the  program  will  branch  (BZ)  to  the  arithmetic 
routines  without  changing  VALUE.  Otherwise  registers  BL  and  DP  are 
loaded  (LR)  with  C0NST4,  Inltally  20000000  and  then  stored  (SR)  under 
CONST.  Next  another  counter,  C0NST2,  Is  checked  by  loading  (LR)  It  In 
DP,  decrementing  (DECR)  DP  and  storing  (SR)  DP  back  at  C0NST2.  This 
counter  helps  to  keep  track  of  the  number  to  be  stored  In  CONST.  If 
It  Is  not  zero,  branch  (BNZ)  to  NEXT,  the  start  of  the  arithmetic 
routines.  Otherwise  load  (LR)  registers  BL  and  DP  with  C0NST5  and  store 
(SR)  CONST5  under  CONST.  C0NST5  has  number  10000000.  Then  branch  (B) 
to  NEXT. 

The  last  statement  Is  a NOP  at  the  label  OUT.  This  Is  the  address 
the  program  branches  to  upon  completion  of  the  Inversion  program;  that 
is,  when  COUNT  Is  zero  Indicating  all  columns  have  been  used  as  the  pivot. 
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1.  MTX2.APL 


The  first  section  of  the  subroutine  consists  of  the  Inltlell- 
zatlon  conditions  slaillar  to  the  Initialization  of  the  main  program. 
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The  first  step  In  loading  array  3 Is  to  store  (SR)  the  value 
In  DP  In  the  location  SAVE.  The  registers  BL  and  DP  are  loaded  (LI) 
with  two  and  that  value  Is  stored  (SR)  at  location  COUNTER,  ^n  pre- 
paration for  the  loading  of  array  3,  DP  Is  loaded  (LR)  with  the 
value  previously  stored  In  SAVE  and  PP2  Is  Inltlalllzed  (LI)  with 
zero.  Note  that  FPE  was  Inltlalllzed  (LI)  to  61  by  the  previous 
section  of  code.  The  loading  operation  Is  an  exact  replica  of  the 
previous  operation  except  that  the  process  Is  repeated  only  two 
times;  when  COUNTER  has  then  decreased  to  zero  the  program  branches 
to  LASTA.  LASTA  Is  also  a loading  operation;  however.  In  this  case 
the  program  only  loads  data  Into  fields  B,D,  and  E.  The  loading 
procedure  Is  the  ssme  as  previously  described. 
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The  STAET  Instruction  Is  required  and  must  precede  any  statements 
which  generate  APPLE  code,  the  label  associated  with  START  will  occur 
on  the  load  map.  The  f>.itry  command  gives  the  main  program  entry  to 
this  subroutine  at  the  label  SUBl.  The  program  Is  flagged  as  being 
relocatable  by  the  R following  the  0 after  the  ORG  statement.  The 
relocation  takes  place  automatically  when  the  programs  are  linked. 

It  is  also  necessary  to  define  the  fields  again  In  the  subroutines 
In  the  same  way  as  In  the  main  program. 

This  portion  of  the  subroutine  divides  field  A to  create  the 
Identity  element  In  the  diagonal  position  of  the  matrix. 
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The  label  SUBl  Is  the  entry  point  of  the  main  program  to  this 
subroutine.  This  operation  Is  Indicated  by  loading  (LI)  field  pointer 
one  (FPl)  with  zero.  Recall,  when  field  pointer  one  Is  loaded  with 
zero,  the  program  Is  directed  to  array  0.  The  response  register  M Is 
set  (SR)  In  all  four  arrays  to  enable  the  division  to  take  place  In 
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each  element  of  field  A across  all  four  arrays.  Field  pointer  two 
(FP2)  Is  loaded  (LR)  with  CONSTO;  CONSTO  Is  Inltlalllzcd  with  the 
value  one.  The  purpose  of  CONSTO  Is  to  give  the  word  position  of 
the  diagonal  element.  Since  the  Identity  row  Is  appended  to  the 
top  of  the  matrix,  the  diagonal  element  will  first  occur  In  word  one. 

This  counter  will  be  Incremented  as  the  program  proceeds.  The 
common  register  Is  loaded  (LC)  with  the  number  from  field  A pointed 
to  by  FPl  and  FP2.  Division  of  field  A by  the  common  register  Is 
accomplished  by  the  floating  point  routine  FDVC.  The  bit  slice  (240) 
following  the  FDVC  command  must  be  provided  to  save  the  original  contents 
of  the  M register.  Two  entries  are  required  as  arguments  following 
FDVC;  the  first  entry  represents  the  dividend  and  the  second  the 
quotient.  The  quotient  Is  then  moved  (MVF)  from  field  G to  field  A. 

The  general  Idea  behind  the  multiplication  routine  Is  to  first 
perform  the  multiplication  In  field  B on  the  top  quarter  of  each 
array  In  succession;  second,  shift  the  mask  by  64  bits  and  perform 
the  multiplication  in  field  B on  the  second  quarter  of  each  array; 
third,  shift  the  mask  again  by  64  bits  and  multiply  field  B of  each 
array  In  succession;  and  finally  shift  the  mask  the  final  time  to 
operate  on  the  bottom  quarter  of  the  array. 
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First,  all  four  arrays  are  enabled  by  loading  (LI)  the  array 
select  register  (ASH)  with  FOOD.  Then  after  clearing  (CLR)  M,  the 
previously  stored  mask  Is  loaded  from  bit  slice  FS(24S)  Into  M with 
a machine  Instruction.  Recall  this  mask  Is  on  bits  0 to  63,  the 
top  quarter  of  the  arrays.  Then  the  multiplication  is  Initialized  by 
loading  (LI)  BL  with  four  to  be  used  as  a counter.  FPI  Is  loaded 
(LI)  with  zero  and  the  array  select  register  (ASH)  with  8000  to  Indi- 
cate the  Intention  of  beginning  with  array  0.  The  common  register 
Is  loaded  (LC)  with  the  word  In  field  B pointed  to  by  the  field 
pointers.  Recall,  FP2  was  previously  loaded  with  CONSTO.  FMPC  Is 
the  floating  point  arithmetic  macro  which  multiplies  all  of  masked 
field  A by  the  common  register,  placing  the  result  In  field  H.  Bit 
slice  240  Is  used  to  store  the  original  mask.  FPI  Is  now  incremented 
(INCR)  to  one;  and  the  array  select  register  (ASH)  Is  set  to  4000,  thus 
enabling  array  1.  The  loading  of  common  register  from  field  B Is  . 
repeated,  this  time  from  array  1.  The  multiple  macro  FMPC  multiplies 
field  A In  array  1 by  the  common  register,  placing  the  result  in  field 
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H.  FPl  Is  Incremented  (INCR)  to  two,  and  the  array  select  register 
is  loaded  with  2000  to  enable  array  2.  Again  (FMFC)  Is  multiplied,  this 
time  In  array  2.  FPl  Is  Incremented  (INCR)  to  three,  and  the  array 
select  register  Is  loaded  (LI)  with  1000  to  enable  array  3.  After 
loading  the  common  register  (LC)  with  the  word  pointed  to  by  FPl 
and  FP2  In  array  3,  (FMPC)  Is  multiplied.  At  this  point  the  top  one 
quarter  of  each  array  has  been  multiplied.  The  array  select  register 
(ASH)  Is  loaded  (LI)  to  enable  all  four  arrays.  BL  Is  decremented 
(DECR)  to  Indicate  one  quarter  of  the  operation  Is  complete.  After 
clearing  (CLR)  Y,  the  mask  from  M Is  loaded  (L)  Into  Y.  A machine 
Instruction  shifts  Y by  64  bits;  then,  after  clearing  (CLR)  M,  M Is 
loaded  (L)  from  Y.  The  program  Is  now  In  a position  to  operate  on  the 
second  quarter  of  the  arrays.  If  the  BL  register  Is  zero,  branch  to 
subtraction  (SUBT);  otherwise,  repeat  (RPT)  sixty-four  times  the 
first  Instruction  following  RPT.  In  this  case.  Increment  (INCR)  FP2 
by  sixty-four.  This  places  the  program  at  the  correct  word  position  for 
the  columns  In  the  second  quarter  of  the  arrays.  Finally,  branch  (B) 
back  to  MULTB  and  continue  as  before.  This  code  will  be  executed 
a total  of  four  times;  the  first  time  Is  on  the  top  quarter  of  the 
arrays  (words  0 to  63),  the  second  time  Is  on  words  64  to  127,  the 
third  time  Is  on  words  128  to  191;  and  the  fourth  time  Is  on  words 
192  to  255.  The  final  result  Is  that  field  A will  have  been  multi- 
plied by  the  word  pointed  to  by  FPl  and  FP2  In  field  B for  all  columns 
In  field  B.  The  result  of  this  multiplication  will  be  In  field  H. 

In  this  part  of  the  code,  the  subtraction  operation  Is  performed 
and  FP2  Is  reset  to  begin  In  field  D. 
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DECR  FP2 


The  subtraction  routine  begins  by  setting  (SET)  M to  enable 
all  words  In  each  array.  FSBF  Is  the  floating  subtraction  routine 
whose  arguments  represent  the  minuend  (field  B) , the  subtrahend 
(field  H),  and  the  difference  (field  G) . Bit  slice  240  Is  used  to 
store  the  original  contents  of  M.  Field  G Is  then  moved  (MVF)  to 
field  B;  and  finally  FP2  Is  decremented  by  192  In  preparation  for  a 
repetition  of  the  previous  multiplication  steps,  this  time  In  field 
D.  This  returns  FP2  to  the  value  It  contained  before  the  program 
started  to  multiply  In  field  B. 

The  rest  of  the  code  In  this  subroutine  repeats  the  above 
operations  with  the  exception  that  the  arithmetic  routines  are 
performed  with  elements  from  field  D rather  than  field  B. 
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[Load  first  quarter  mask 
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Figure  23;  Hrx2.APL  Listing 
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MTX3.APL 


The  second  subroutine  Is  almost  Identical  to  the  first  subroutine. 
Its  purpose  is  to  perform  the  arithmetic  operations  on  fields  E and 
F.  Since  the  Identity  element  In  field  A was  previously  created 
In  MTX2.APL,  that  part  of  the  program  is  not  repeated.  However, 
the  multiplication  and  subtraction  portions  duplicate  MTX2.AFL  ex- 
cept for  the  fields  they  operate  on.  There  Is  one  change,  however, 
that  should  be  noted.  At  the  end  of  the  subtraction  In  field  F,  FP2 
Is  decremented  by  191  rather  than  192.  This  has  the  effect  of  In- 
creasing CONSTO  so  that  the  next  time  the  subroutine  Is  called,  thr 
Identity  element  Is  created  in  the  next  sequential  position. 
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Figure  25;  MBG.APL  Listing 
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SUBA.APL 
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This  program  takes  data  from  the  first  quarter  of  any  array 
and  replicates  It  sixteen  times  In  field  A.  In  addition,  the 
routine  loads  the  Identity  row  Into  the  matrix  In  the  new  position. 
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The  Initial  portion  of  this  program  has  the  same  purpose  as  the 
Initial  portion  of  the  previous  subroutine.  In  this  case,  the  entry 
point  to  the  subroutine  Is  at  the  label  SUBA. 

In  this  portion  of  the  subroutine  the  data  are  replicated  In  field  A. 
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Except  for  Inlttallzing  FPl  with  C0NST3,  this  section  of  the 
subroutine  is  exactly  the  sane  as  the  section  of  the  naln  program 
uhlch  replicates  the  data.  C0NST3  is  initially  0,  indicating  the 
intention  of  working  in  array  0.  It  is  incremented  each  time  the 
naln  progran  has  been  executed,  thus  moving  the  subroutine  from  array 
to  array. 

Here  the  new  identity  row  is  appended  to  the  matrix  in  preparation 
for  using  a new  pivot  column. 
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FP2 

DECR 

FPE 

BNZ»r-PE 

ONE 

I NCR 

FPl 

DECR 

FP3 

BNZ»FP3 

IDENF A 

This  operation  is  Intlalllzed  by  loading  (LI)  FP3  with  4 to 
count  the  arrays;  FFl  with  0 to  initiate  in  array  0 and  FI'2  with 
CONSTO,  which  when  decremented  indicates  the  word  position  of  the  new 
Identity  row.  Recall  that  CONSTO  was  used  by  the  previous  arith- 
metic subroutines  to  indicate  the  poslton  of  the  number  to  be  used 
for  division  and  multiplication.  FPE  is  loaded  with  4 since  the 
column  in  field  A is  replicated  four  times  in  each  array;  the  common 
register  is  loaded  (LI32)  with  one  (0140000  in  hex)  and  the  value 
in  the  common  register  is  stored  (SCW)  into  field  A in  the  position 
pointed  to  by  the  field  pointers  FPl  and  FP2.  FP2  is  then  Incremented 
64  times  (RPT,  64)  to  place  the  program  in  the  second  quarter;  FPE  is 
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decremented.  FPE  Is  now  tested  and  if  It  is  not  zero,  the  program 
branches  (BNZ)  to  ONE  and  repeated  storing  one  in  field  A.  Once  FPE 
reaches  zero  It  Indicates  that  an  array  has  been  completed.  The  program 
Increments  (INCR)  FPl  to  point  to  the  next  array  and  decrements  (DECK) 
FP3.  FP3  Is  the  counter  which  counts  the  number  of  arrays.  If  FP3  Is 
not  zero,  branch  (BNZ)  to  the  beginning  (IDENTA)  of  the  operation  and 
repeat  for  the  next  array;  otherwise  continue. 

In  this  section,  the  zero  element  of  the  Identity  row  Is  stored 
In  all  columns. 


LI 

FP3»4 

LI 

FPl  rO 

I DENT 

LR 

FP2f eONSTO 

DECR 

FP2 

LI 

FPEf  4 

ZERO 

LI32 

ejX'80000000 

sew 

A»B 

sew 

A»D 

sew 

AfE 

sew 

RPT>64 

A»F 

INeR 

FP2 

EiEeR 

FPE 

BNZ f FPE 

ZERO 

INeR 

FPl 

DEeR 

FP3 

BNZ»FP3 

I DENT 

B 

0(R7) 

This  final  portion  operates  In  the  same  way  as  the  previous 
portion  except  that  a zero  (80000000  In  hex)  Is  loaded  Into  the 
common  register  and  that  zero  Is  stored  Into  fields  B,D,E,  and  F of 
each  array.  The  procedure  Is  Identical  to  the  previous  section 
except  that  the  program  must  store  four  times  at  each  step. 


r 


SUBA.APL  10/14/76  1124,1  edt  Thu 


SUB3 

START 

ENTRY 

SUBA 

EXTRN 

COUNTER . FOUR . VALUE . C0NST3 

EXTRN 

CONSTl. CONST. C0NST0.C0NST2 

ORG 

O.R 

A 

DF 

0.32 

B 

DF 

32.32 

D 

DF 

64 . 32 

E 

DF 

96.32 

F 

DF 

128.32 

G 

DF 

160.32 

H 

DF 

192.32 

SUBA 

LR 

FP1.C0NST3 

LI 

FP2.0 

CLR 

X 

CLR 

Y 

LOOP » 32 

MOVEB 

GEN f 32 

X'63C0A0FD' 

GEN. 32 

X' 42008840' 

GEN. 32 

X'63C1A0FD' 

GEN. 32 

X'42E088A0' 

GEN. 32 

X' 40009943' 

GEN. 32 

X'400088A2' 

GEN. 32 

X'40C08852' 

GEN. 32 

X' 40009943' 

GEN. 32 

X'400088A2' 

GEN. 32 

X' 40808852' 

GEN. 32 

X'40009943' 

MOUEB 

GEN. 32 
NOP 

X'1B600002' 

LI 

FP3.4 

LI 

FPl.O 

IDENTA 

LR 

FP2.C0NST0 

DECR 

FP2 

LI 

FPE  . 4 

ONE 

LI32 

C.X'01400000' 

SCU 

RPT.64 

A.  A 

INCR 

FP2 

DECR 

FPE 

BN2.FPE 

ONE 

INCR 

FPl 

DECR 

FP3 

BNZ.FP3 

IDENTA 

Figure  27;  SUBA.APL  Listing 
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LI 

FP3.4 

LI 

FPl  .0 

IDENT 

LR 

FP2.C0NST0 

DECR 

FP2 

LI 

FPEf4 

ZERO 

LI32 

CfX'80000000 

SCU 

AfB 

SOU 

ArD 

SCU 

AfE 

SCU 

RPTf64 

AfF 

I NCR 

FP2 

DECR 

FPE 

BNZfFPE 

ZERO 

INCR 

FPl 

DECR 

FP3 

BNZ»FP3 

IDENT 

B 

0(R7) 

END 

END 

r 1125  0.332  1.176  56 
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Flauf  27;  Continued 
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SUBB .APL.  SUBC.AFL.  and  SUBD.APL 


These  subroutines  are  Identical  to  SUBA.APL  except  that  they 
use  bit  slices  from  successive  quarters  of  field  A for  replication. 
Specifically,  SUBA.APL  loads  the  common  register  from  words  0 to  63, 
SUBB. APL  uses  word  64  to  127,  SUBC.APL  uses  word  128  to  191  and 
SUBD.APL  uses  words  192  to  255. 


pr  SUBB.API 


SLIBB.APL  J.0/14/76  1125.6  edt  Thu 


SUB4 

START 

ENTRY 

SUBB 

EXTRN 

C 0 1 1 N r E R > F 0 U R > U A 1.  U E > C 0 N S ( 3 

EXTRN 

CONST 1 > CONST  > CONS  I 0 • CnNST2 

ORG 

0>R 

A 

DP 

0>32 

t< 

DF 

32  > 32 

n 

DF 

64>32 

E 

DF 

96  >32 

F 

DF 

128 >32 

G 

DF 

160 >32 

H 

DF 

192 >32 

SUBB 

LR 

FPl >C0NST3 

LI 

FP2>0 

CLR 

X 

CLR 

Y 

LOOP >32 

MOOEC 

GEN»32 

X'63C2A0FD' 

GEN >32 

X' 42008840' 

GEN >32 

X'63C3A0FD' 

GEN >32 

X'42E088A0' 

GEN >32 

X' 40009943' 

GEN >32 

X'400088A2' 

GEN >32 

X'40C08852' 

GEN >32 

X' 40009943' 

GEN >32 

X'400088A2' 

GEN >32 

X' 40808852' 

GEN >32 

X'40009943' 

MOVEC 

GEN>32 

NOP 

X'1B600002' 

LI 

FP3>4 

LI 

FP1>0 

IDENTA 

LR 

FF’2>C0NST0 

DECR 

FP2 

LI 

FPE>4 

ONE 

LI32 

C>X'01400000' 

sew 

RPT>64 

A > A 

I NCR 

FP2 

DECR 

FPE 

BNZ>FPE 

ONE 

I NCR 

FPl 

DECR 

FP3 

BNZ>FP3 

IDENTA 

- 

Figure  29;  SUBB.AFL  Listing 
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LI 

FP3.4 

LI 

FPlrO 

I DENT 

LR 

FP2.e0NST0 

DECR 

FP2 

LI 

FPEf  4 

ZERO 

LI32 

e.X'80000000 

sew 

A.B 

sew 

A.D 

sew 

AfE 

sew 

RPT.64 

A.F 

INeR 

FP2 

DEeR 

FPE 

BNZ.FPE 

ZERO 

INeR 

FPl 

DEeR 

FP3 

BNZfFP3 

I DENT 

B 

END 

END 

0<R7) 

1126  0.350  1.080  54 
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Figure  29;  Continued 
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r-r  SUBC.APL 


SUBC.APL  10/14/76 


1253.8  e?dt  Thu 


SUBS 

START 

ENTRY 

SUBC 

EXTRN 

COUNTER , FOUR , VALUE , CQNST3 

EXTRN 

CONST 1 , CONST , CONSTO , CONST 

ORG 

0,R 

A 

DF 

0,32 

B 

DF 

32,32 

D 

DF 

64,32 

E 

DF 

96,32 

F 

DF 

128,32 

6 

DF 

160,32 

H 

DF 

192,32 

SUBC 

LR 

FP1,C0NST3 

LI 

FP2,0 

CLR 

X 

CLR 

Y 

LOOP » 32 

MOVED 

GEN, 32 

X'63C4A0FD' 

GEN, 32 

X' 42008840' 

GEN, 32 

X'63C5A0FD' 

GEN, 32 

X'42E088A0' 

GEN, 32 

X'40009943' 

GEN, 32 

X'400088A2' 

GEN, 32 

X'40C08852' 

GEN, 32 

X' 40009943' 

GEN, 32 

X'400088A2' 

GEN, 32 

X' 40808852' 

GEN, 32 

X'40009943' 

MOVED 

GEN, 32 
NOP 

X'1B600002' 

LI 

FP3,4 

LI 

FPl  ,0 

IDENTA 

LR 

FP2, CONSTO 

DECR 

FP2 

LI 

FPE , 4 

ONE 

LI3? 

C,X' 05 400000' 

SCui 

RPT,64 

A,  A 

I NCR 

FP2 

DECR 

FPF 

BN2,FPE 

ONE 

INCR 

FPl 

DECR 

FP3 

BNZ,FP3 

IDENTA 

FiguCT  31:  SUBC.AFL  Idating 
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LI 

KP3.4 

LI 

FPl  tO 

I DENT 

LR 

FP2>e0NST0 

DECR 

FP2 

LI 

FPE.4 

ZERO 

LI32 

e.X'80000000 

sew 

AfB 

sew 

ArD 

sew 

A.E 

sew 

A»F 

RPT»64 

INeR 

FP2 

DEeR 

FPE 

BNZfFPE 

ZERO 

INeR 

FPl 

DEER 

FP3 

BNZf PP3 

IDENT 

D 

0(R7> 

END 

END 

r 1254  0.547  0.668  27 


Figure  31;  Ccntinued 
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pr  SUBD.APL 


SIJBD.APL  10/14/76  1127.0  edt  Thu 


SUB6 

START 

ENTRY 

SUBD 

EXTRN 

COUNTER, FOUR, VALUE, C0NST3 

EXTRN 

CONSTl , CONST , CONSTO , C0NST2 

ORG 

0,R 

A 

DP 

0,32 

B 

DF 

32 , 32 

0 

DF 

64,32 

E 

DF 

96,32 

F 

DF 

128,32 

G 

DF 

160,32 

H 

DF 

192,32 

SUBO 

LR 

FP1,C0NST3 

LI 

FP2,0 

CLR 

X 

CLR 

Y . 

LOOP, 32 

MOVEE 

GENr32 

X'63C6A0ED' 

GEN, 32 

X' 42008840' 

GEN, 32 

X'63C7A0FD' 

GEN, 32 

X'42E088A0' 

GEN, 32 

X' 40009943' 

GEN, 32 

X'400088A2' 

GEN, 32 

X'40C08852' 

GEN, 32 

X'40009943' 

GEN, 32 

X'400088A2' 

GEN, 32 

X'40808852' 

GEN, 32 

X' 40009943' 

MOVEE 

GEN, 32 
NOP 

X'1B600002' 

LI 

FP3,4 

LI 

FP1,0 

IDENTA 

LR 

FP2, CONSTO 

DECR 

FP2 

LI 

FPE,4 

ONE 

LI32 

C,X'01400000' 

sew 

RPT,6' 

A , A 

INCR 

FP2 

DECR 

FPE 

BNZ,FPE 

ONE 

INCR 

FPl 

DECR 

FT- 3 

Figure  33;  SUBD.APL  Listing 


B-72 





bnz»ffm; 

IDENTA 

LI 

FP3»4 

LI 

FP1»0 

I DENT  LR 

FP2fC0NST0 

DECR 

FP2 

LI 

FPE  r 4 

ZERO  LI32 

C.X'800000( 

SCU 

AfB 

SCU 

AfD 

sew 

AfE 

sew 

AfF 

RPT»64 

I NCR 

FP2 

DECR 

FPE 

BNZfFPE 

ZERO 

I NCR 

FPl 

DECR 

FP3 

BNZ»FP3 

IDENT 

B 

0(R7) 

END 

END 

r 1128  0.323  1.198 

57 

Figure  33;  Continued 


